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Abstract 

Large observational studies have become commonplace in medical research. Treatment 
may be adapted to covariates at several instances without a fixed protocol. Estimation or 
even definition of treatment effect is difficult in that case. Treatment influences covariates, 
which influence treatment, which influences covariates, etcetera. Thus, if a patient is doing 
fine, that could mean either that the patient is "strong" or that the medication had a good 
effect. To distinguish between these options, even the famous time-dependent Cox-model 
cannot be used. It estimates the rate at which the event of interest happens (e.g., the 
patient dying) given past treat ment and cova r iate histo ry, but the n et eff ect cannot be 
derived from just this rate; see iRobin j l)l998|) . iKeidh^ l|l999). Lolj ^00^ or Section [T] 
bel ow. 

iRobind 1I1992I [199I, iKeidind lll99<i) andlEoS tOOA \2004 study Structural Nested 
Models to estimate treatment effects even in this difficult setting. Their methods are 
based on so-called counterfactuals: the outcome a patient would have had if treatment 
was withheld after a certain time. It is clearly impossible for these outcomes to be 
observed in all patients. Yet we will show how counterfactual thinking is a very helpful 
tool to study estimation of treatment effect in the presence of time-dependent covariates. 
Previous work on these models was usually based on the assumption that the correct 
model combined with observations made it possible to calculate all counterfactuals for 
each patient. This assumption was considered n ot plausible, since it assumes the exa ct 
same treatment effect for each patient (see e.g. iHollandL Il986l iRobinsl Il992l Il998l or 
iKeiding 1999). Thisp aper pro vides the corne rstone for the relaxation of treatment effects 
in e.g. iRobind . l|l995ll99^ or lKeidiiiS l)l999() in that, at least if there is no censoring, the 
assumption that counterfactuals are connected with the observed data in a deterministic 
way is not necessary. We hope that this will contribute to the discussion about causal 
reasoning. 
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1 Causal inference and count erf actuals 

Except for in randomized experiments, statisticians usually restricted themselves to the mod- 
elling of association, and warned against using statistical models for causal inference. By lack 
of alternatives and since causal questions are often of great interest, models f or association 



are, in spite of these warnings, often interpreted in a causal way. This leads ISpirtes et al 
(|l993l l to ask: "Why is so much of statistical application and so little of statistical theory 
concerned with causal inference?" 

A controversy that often returns in the current research on causality, is whether or not 
to think in terms of so-called counterfactuals. We explain this concept with an example. 
Suppose that we want to compare two treatment regimes, say gi (e.g.: always treat) and 52 
(e.g.: never treat). Consider for the moment just one patient, and consider the outcomes that 
this patient would have had in case he or she would have been treated according to regime 
gi and regime g2', denote them by Y^^ and Y^^, respectively. These outcomes can of course 
never be both observed. Y^^ and Y^'^ are counterfactuals: they are the outcomes in case, 
possibly counter to fact, the patient would have been treated according to regimes gi and g2, 
respectively. 

Objections to the use of counterfactuals seem to concentrate on the impossibility to say 
exactly what is "the closest possible world to this one in which ..." On the other hand, 
many will argue that carrying out the thought experiment "what would have happened if" 
and making the prediction "what would happen if" are exactly what scientific research is 
all about. Of course, in order to do so one has to have a clear idea about the setting in 
which the question is asked, and the more far-fetched or unrealistic are the predicates, the 
less convincing are the predictions. 



We will not go deeply into this controversy. Gill and Robins! ( 200 ll ) show that, at least 



in the discrete-time case, the counterfactuals are "free" in the sense that they place no re- 
strictions on the distribution of the obse rved v a riable s. Fo r the continuous tim e case that we 
study, such a result has not been found. iPear and ISpirtes et all (jl998h use graphs to 



study causality. These graphs can always be expressed in terms of counterfactuals, and I am 
unable to avoid seeing counterfactuals behind those graphs. I consider counterfactuals a very 
natural and useful framework to think about causal questions, and have not made attempts 
to avoid them. 

Another controversy in the causal literature is that counterfactuals are often supposed 
to be connected with the data in a deterministic way: if (the parameters in) the model 
would be known, the counterfactual outcomes for each patient could simply be calculated. 
Consequently, treatment is then said not to affect the outcome of interest if the outcome 
for any particular patient would be exactly the same whatever treatment was given. This is 
usually called the "sharp null hypothesis". This assumption of determi n istic d ependence is 
related with the assumption of constant effect as explained in iHollandl (jl98fih . section 4.4, 



which says that the difference between various counterfactual outcomes is a constant which 
is the same for each patient. 

This assumption of constant treatment effect is not attractive, and it may be an impor- 
tant reason for statisticians to avoid counterfactuals. Constant treatment effect can never be 
tested, since only one outcome is observed for each patient, but it is not likely to be true. If 
possible at all, an assumption of constant treatment effect should not be made. Fortunately, 
this article shows that an assumption of constan t treatrnent eff ect is not necessary. Distribu- 
tional assumptions suffice, as was conjectured in Robind (|l998l l and is the main result in the 
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current article. 



2 Confounding by indication in time 

As a thought experiment, suppose that we want to investigate the effect of a medicine meant 
to lower the blood pressure. Also, suppose that this medicine has no effect whatsoever, but 
this is unknown to the investigator and subject of investigation. If the doctors expect this 
medicine to be effective and the investigator does not interfere in the treatment assignment it 
can be expected that the doctors give larger doses of medication to patients with the highest 
blood pressure: those are the ones who need it most. A straightforward comparison then 
leads to the following. In the group which received larger doses, patients with a relatively 
high blood pressure will be over-represented. Thus one can expect that at the end of the 
study the blood pressure in the group with higher doses is even higher than in the group with 
lower doses. The obvious conclusion from observing just this association would be that this 
medicine is harmful. But we started this thought experiment with a medicine that had no 
effect whatsoever! 

In this thought experiment the distortion is caused by the fact that in the higher-doses 
group the patients with a very high blood pressure are over-represented. In statistical terms 
this a special case of Sir npson 's paradox, ep idem iologists call it se l ection bias or confounding 
by indication. See e.g. jPearll (j2nnnl . Il999h and iGreenberg et al.l I (|l993l V respectively. This 



phenomenon often occurs in observational data. 

If this confounding by indication only takes place at the start of the treatment, one can 
condition on initial patient characteristics in order to remove the problem and get meaningful 
estimates of the effect of treatment. One can e.g. use regression, logistic regression or Cox 
regression for that. This is common knowledge. However, if treatment decisions after the 
start of the treatment are influenced by the state of the patient at this later time, much less 
is known. Treatment might be influenced by a patient's state in the past and also (if it has 
an effect) treatment itself may be a predictor of a patient's state in the future (in the sense 
of correlation). Sometimes this makes it impossible to distinguish between treatment effect 
and selection bias. 

Confounding by indication 

treatment , ^tate of the 

patient 

may change may change 

any time any time 

How to distinguish between treatment effect and the effect of 
the state of the patient (selection bias)? 



To illustrate this we use a hypothetical example from lEobinsI ( 199,4 1997 '). see Figure ^ 



below. Suppose that a group of 32.000 AIDS patients is randomized at time tq to either 
AZT-treatment {Aq = 1, 16.000 patients) or no treatment {Aq = 0, 16.000 patients). Our 
interest is in the effect of AZT-treatment on survival. At time ti, after tq, PCP-status 
is measured; PCP, Pneumocystis pneumonia, is a common opportunistic infection of AIDS 
patients. In this example, the effect of AZT on survival is confounded by the use of a second 
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medication: PCP-prophylaxis. Patients who developed PCP at least once at time ti (Li = 1) 
always receive PCP-prophylaxis at time ri (^i = 1). Only half of the patients without PCP 
(Li = 0) receive PCP-prophylaxis {Ai = 1); the other half does not receive PCP-prophylaxis 
(^1 = 0) (assignment by randomization). In the end, at time T2, the number of survivors is 
recorded in each group. For the outcomes see Figure ^ below. 




/ 1 160 
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320 indicates 320 x 100 patients 

Figure 1: Confounding by PCP-status 

In this hypothetical example large numbers of patients have been chosen to indicate that, 
at this stage, we are not interested in random variation, only in confounding and bias. Every 
patient who was not treated with AZT had developed PCP at least once at time ri, and only 
half of the patients who were treated with AZT had developed PCP at time ri. Because 
the patients were randomized over AZT treatment at time tq it is correct causal statistical 
reasoning to infer that AZT prevents PCP. Also, we see that patients who received AZT did 
worse in the end than patients who did not receive AZT: in Figure ^ we add up groups b, 
c and d to find 80 (80) in the AZT-group and 100 (60) in the group without AZT (group 
a), so of the patients treated with AZT 8000 survived and 8000 died as opposed to 10000 
survivors and 6000 deaths in the group of patients who did not receive AZT. That patients 
who received AZT did worse in the end than patients who did not receive AZT remains true 
if we restrict to patients who received PCP-prophylaxis, 70 (50) with AZT (groups b and c) 
versus 100 (60) without AZT, and if we restrict to patients who developed PCP, 40 (40) with 
AZT versus 100 (60) without AZT. A naive conclusion could be that AZT is harmful, even if 
taking PCP into account. 

However, as Robins points out in e.g. lEobinj (Il987h : iLok et ail (j2004l ). the above obser- 
vations are due to confounding (by PCP-status) and not to causation, and a causal analysis 
should be done as follows. Obviously, PCP-prophylaxis has a good effect in patients with- 
out PCP at time ti: 30 (10) with prophylaxis (group c) versus 10 (30) without prophylaxis 



Mimicking counterfactual outcomes 



5 




AZT- PCP- PCP- survivors at T2 
treatment status prophylaxis (deaths at T2) 
I 1 1 

To Tl time —^ T2 

320 indicates 320 x 100 patients 

Figure 2: Confounding by PCP-status and G-computation algorithm 



(group d). Patients who were not treated with AZT all received PCP-prophylaxis, so the 
natural comparison is between the treatment regime of giving both AZT at time tq and PCP- 
prophylaxis at time ti and the treatment regime of both withholding AZT at time tq and 
giving PCP-prophylaxis at time ti. This would still lead to 10.000 survivors in the group 
without AZT. In the other group, we have to find out what would have happened with the 
4.000 patients who did not receive PCP-prophylaxis, group d in Figure ^ See also Figure El 
Note that at time ri group d is still comparable with group c. Thus, with PCP-prophylaxis, 
3.000 of them would have survived. This leads to Figure [2 Therefore the treatment regime 
of giving both AZT at time tq and PCP-prophylaxis at time ti would lead to 10.000 sur- 
vivors. That is the same as the treatment regime of giving no AZT at time tq and giving 
PCP-prophylaxis at time ri. Thus it turns out that in this example AZT-treatment does not 
affect death at time T2, although the remarks in the former paragraph suggested otherwise. 
The above an a lvsis is a specific case of what is called the G-computation formula, see e.g. 
RobinsI (ll987h : lLok et al.l tooi ). 



PCP in this example is what epidemiologists call a marker: it does not cause death itself, 
but it indicates that the patient concerned belongs to a group with a worse disease prognosis. 
It is a confounder for the effect of AZT on survival since, given the past, 1. it predicts 
subsequent treatment and 2. it predicts the outcome of interest, even given that subsequent 
treatment. 

We conclude that the information about patient prospects used for treatment decisions 
(both patient characteristics or covariates and past treatment history) has to be taken into 
account if one wants to estimate treatment effect. This has to be done carefully, since the 
patient characteristics used for treatment decisions may be influenced by past treatment. Thus 
they may themselves be indications of treatment effect, and in this case simply conditioning 
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on them can also lead to false conclusions. E.g., in the example of Figure ^ above, simply 
conditioning on PCP-prophylaxis or PCP-status leads to false conclusions about the effect of 
AZT-treatment. 



3 No unmeasured confounding 

Structural nested models, proposed in lEohin j ()l98fll ^: iLok et a1.l (j2nn4 l: iRohinj (fmO^ . Il 998^ 
to solve practical problems in epidemiology and biostatistics, effectively overcome these diffi- 
culties and estimate the effect of time-varying treatments. The main assumption underlying 
these models is that all information the doctors used to make treatment decisions, and which 
is predictive of the patient's prognosis with respect to the final outcome, is available for 
analysis. This assumption of "no unmeasured confounding" makes it possible to distinguish 
between treatment effect and selection bias. Without some assumption that is impossible, 
which is why this assumption of no unmeasured confounding cannot be tested statistically. 
What data have to be collected to satisfy this assumption of no unmeasured confounding is for 
subject matter experts to decide. All of the past treatment- and covariate information which 
both 1. influences a doctor's treatment decisions and 2. is relevant for a patient's prognosis 
with respect to the outcome of interest, has to be recorded. 

Note that it is not necessary that everything is recorded which predicts treatment alloca- 
tion: treatment may have been given or withheld for reasons which had nothing to do with 
the patients' prognosis with respect to the outcome of interest. For example, a patient could 
have an allergic reaction, e.g. badly swollen eyes, to the study medication. If this allergic 
reaction is believed to be no indication for his or her prognosis with respect to the outcome 
of interest, e.g. survival, it is no confounder in the sense of the preceding paragraphs. 

Of course the relevant treatment- and covariate information should not determine the 
treatment decisions in a deterministic way. Variation in treatment is necessary to study the 
effect of different treatment decisions. This is not surprising: if a fixed treatment protocol is 
followed one cannot say what would have happened under different treatment regimes unless 
one makes strong assumptions on how treatment affects the outcome. 

If this variation is not planned, as it is in a randomized clinical trial, the natural variation 
in treatment has to be exploited. This can occur for several reasons, e.g. variation from 
doctor to doctor, changing treatment policies in time, availability of medication, or any 
other natural variation. That is, as long as it does not contradict the assumption of "no 
unmeasured confounding". For example, if treatment decisions are different in one district 
than in another, "district" can be a source of natural variation. But if patients from the 
different districts with the same covariate values are not comparable with respect to their 
disease prognosis, the district has to be included in the covariates since it is a confounder in 
the above sense, and this variation is a hindrance, not a help. 



4 Setting and estimation 

Structural nested models are models for relations between so-called counter f actuals. Consider 
for a moment just one patient. In reality this patient received a certain treatment and had 
final outcome Y. If his or her actual treatment had been stopped at time t, the patient's 
final outcome would probably have been different. The outcome he or she would have had in 
that case we can yw. Of course, Y^^^ is generally not observed, because the patient's actual 
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treatment after t is usually different from no treatment; it is a counterfactual outcome. Instead 
of stopping treatment one can also consider switching to some kind of baseline treatment, e.g., 
standard treatment. Figure IHl illustrates the nature of counterfactual outcomes. The current 
article deals with mimicking these counterfactual outcomes using the available data. This is 
an important result underlying structural nested models in continuous time. 

Treatment (0 indicates "no treatment") Outcome 



treatment 



Observed treatment 
o 



Y 



time 



treatment 



Treatment stopped at t 
o 



t time 



treatment 



Treatment stopped at s 
o 



time 



Figure 3: Observed and counterfactual outcomes 



Structural nested models in continuous time allow for both changes in the values of the 
covariates and treatment decisions to take pla ce at arbitrary times for different patients. They 
are proposed in ..RQMns LL992 . 1998.). lEobinsi (.199 2. 199S) conjectures how the statement that 
"tre a tment does not affect the outcome of interest" can be expressed in terms of the model. 
(j2nnil . EkmI^ provides a conc eptual frarnework and mathematical formali zation o f these 



pract~methods: In particular, Q H, » 

proves the conjectures in iRobinsI (|l998l ) 
that structural nested models in continuous time lead to estimators which are both consistent 
and asymptotically normal. Besides that, a test for whether treatment affects t he outcome 
of interest can be carried out without specifying a model for treatment effect, 
shows that a subclass of these estimators can be considered from a partial likelihood point 
of view, and that estimators in that subclass can often be calcul a.ted with stan dard software, 
by using that software in a non-standard way as proposed in e.g. EobinsI (Il 99^. Moreover, a 
subclass of the tests for whether treatment affects the outcome studied in 
can be considered from a partial likelihood point of view, and can be carried out with standard 
software. The r esult on mimi cking count erfactual outcom es proved in the current article is a 
cornerstone for lRobiriij (jl 9921 . F998^ and ll^ (j2nnil . 12004 ^. 

In the previous literature, continuous-time applications have been carried out under the 
assumption that counterfactuals are connected with the observed data in a deterministic way: 
if (the parameters in) the model were known, the counterfactual outcomes for each patient 
could simply be calculated from the observed data (see (jH)) with ~ replaced by =). See 
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e.g.lRobins et alJ (ll992h . lMark and Robinj (Il99i l . IWitteman et alJ (|l998h and lKeiding et al 
Il999| ). This is a very strong condition, whic h, though untestabl e , is generally considered 



unplausabl e. The previo us literature (see e.g. iR.obins et al. , Il992l . iMark and Robin 4 ll 99.i 
and Keidi ng et al. . Il999h recognized this, and conjectured that this assumption could be re- 
laxed, since it was know n that this assum ption could be relaxed for structural nested models 



in discrete time (see e.g. lLok et al.l . l2004 ). The result on mimicking counterfactual outcomes 



proved in the current article makes it possible to relax the specification of the counterfactual 
outcomes as deterministically deppendent on observed variables, and allows for a distribu- 
tional interpretation of the estimators. 



5 Setting and notation 

The setting to which structural nested models in continuous time apply is as follows. The 
outcome of interest is a continuous real variable Y. For example, a patient's survival time, 
time to clinical AIDS, or number of white blood cells after the treatment period. Our objective 
is to estimate the effect of treatment on Y. There is some fixed time interval [0, r], with r a 
finite time, during which treatment and patient characteristics are observed for each patient. 
We suppose that after time r, treatment is stopped or switched to some kind of baseline 
treatment. This article ass umes th a t the re is no censoring, and Y is observed for every 
patient in the study. See e.g. EobinsI (|l998h for ideas about dealing with censoring. 



We denote the probability space by (0,.7^, P). The covariate process describes the course 
of the disease of a patient, e.g. the course of the blood pressure and the white blood cell 
count. We assume that a realization of this covariate process is a function from [0, r] to W^, 
and that such a sample path is continuous from the right with limits from the left (cadlag). 
The covariates which must be included are those which both (i) influence a doctor's treatment 
decisions and (ii) possibly predict a patient's prognosis with respect to the outcome of interest. 
If such covariates are not observed the assumption of no unmeasured confounding, mentioned 
in the introduction, will not hold. 

For the moment consider a single patient. We write Z{t) for the covariate- and treatment 
values at time t. We assume that Z{t) takes values in M™, and that Z{t) : 17 is 
measurable for each t G [0,r]. Moreover, we assume that Z, seen as a function on [0, r], 
is cadlag, and that with probability one this function has only finitely many jumps. We 
also assume that the probability that the covariate- and treatment process Z jumps at t 
equals for every fixed time t, except possibly for finitely many fixed times t. We write 
Zf = {Z (s) : < s < t) for the covariate- and treatment history until time t, and Zt for the 
space of cadlag functions from [0, t] to M"^ in which Zt takes it values. Similarly we write Z 
for the whole covariate- and treatment history of the patient on the interval [0,r], and Z for 
the space in which Z takes its values. As the cr-algebra on Zt and Z we choose the projection 
(j-algebra; measurability of Z{s) for each s < t is then equivalent with measurability of the 
random variable Zt- 

Counterfactual outcomes were already mentioned in the introduction. We suppose that 
all counterfactual outcomes Y^^\ for t G [0,r] and for all patients, are random variables on 
the probability space {^},J^,P). 

We suppose that observations and counterfactual outcomes on different patients are inde- 
pendent. 
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6 The model for treatment effect 

Structural nested models in continuous time model distributional relations between Y^^^ and 
yit+h)^ for /i > small, through a so-called infinitesimal shift-function D. Write F for the 
cumulative distribution function and : (0, 1) i— > M for its generalized inverse 

(p) = inf {x : F{x)>p}. 

Then the infinitesimal shift-function D is defined as 



h=0 



the right-hand derivative of the quantile-quantile transform which moves quantiles of the 
distribution of y*-*-* to quantiles of the distribution of (/i > 0), given the covariate- and 

treatment history until time t, Zt- 

Examp l e 6.1 Effe ct of Graft versus Host Disease (GvHD) on time to leukemic relapse. 
Keidin^ \l994 ) and Keidina et al. (199^) describe an experiment to study the effect of GvHD 



on time to leukemic relapse inpatients who had Bone Marrow Transplantation (BMT). Infec- 
tion with CMV (Cytomegalovirus) is a time dependent confounder: an independent prognostic 
factor for relapse that both 1. predicts the subsequent development of the exposure GvHD and 
2. is predicted by past exposure GvHD. Write Y for the time until leukemic relapse, and write 
G for the time until GvHD. Assume that Y is observed for every patient. Suppose now that 



Define T[t) = I{G < T) as the indicator for whether GvHD occurred before time t. Then (see 
Section^ for details), fort < Y, preventing GvHD from t onwards leads to, with ~ meaning 
"is distributed as", 

f-Y _ 

Y^^)-t^ e^^^dt given Zt. (3) 



Thus, treated residual time to relapse (t until Y) i s multiplied by ef by pr eventing GvHD; com- 
pare with accelerated failure time models, see e.g. \Cox an d Oakei fl982). This multiplic ation 
factor e^ should be interpreted in a distributional way. Keidina (199"^) and\Keidina et al. 
)[199d ) assume that ^ is true even with ~ replaced by = (though only for t = 0), noticing 
that that assumption is unfortunate. Supposing ^ to be true with ~ replaced by = for all t 
would be even stronger. The current paper proves that = is not necessary, under regularity 
conditions only. 



Example 6.2 Survival of AIDS patients. RobinJi \l99A ) describe an AIDS clinical trial to 



study the effect of AZT treatment on survival in HIV-infected subjects. Embedded within this 
trial was an essentially uncontrolled observational study of the effect of prophylaxis therapy for 
PCP on survival. PCP, Pneumocy stis Car i nii Pn eumonia, is an opportunistic infection that 
afflicts AIDS patients. The aim of Robin Ji \l99^) was to study the effect of this prophylaxis 



therapy on survival. Thus the outcome of interest Y is the survival time, and the treatment 
under study is prophylaxis for PCP. Once treatme nt with proph y laxis f or PCP started it was 
never stopped. In one of the models mentioned in iRobins et al. the factor with which 



treated residual survival time is multiplied when treatment is withheld can depend on the AZT 
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treatment the patient received and whether or not the patient had a history of PCP prior to 
start of PCP prophylaxis. Since this was a clinical trial for AZT treatment, the AZT treatment 
was described by a single variable R indicating the treatment arm the patient was randomized 
to (R is 1 or 2). Whether or not the patient had a PCP history prior to start of prophylaxis 
is described by an indicator variable P{t). P (t) equals 1 if the patient had PCP before or at 
t and before prophylaxis treatment started; otherwise P (t) equals 0. If 

D^,,^,,^, {y,t;Zt) = (1 - eV'i+V^PW+^aR) l^,^^^^^^ 

then (see Section^ for details), withholding prophylaxis treatment from t onwards leads to 

rY _ 

y(t) - t^ el{t.-oatcd at s}{^l+^2P{s)+^3R)^g g^^^^ ^t, (4) 

for t < Y. 



Example 6.3 (Incorporating a-priori biological knowledge). Following RobinA h99d ). again 



consider survival as the outcome of interest. Suppose that it is known that treatment received 
at time t only affects survival for patients destined to die by time t + 5 if they would receive no 
further treatment. An example would be a setting in which failure is death from an infectious 
disease, the treatment is a preventive antibiotic treatment which is of no benefit unless the 
subject is already infected and, if death occurs, it always does within five weeks fr o m the time 
of initial unrecorded subclinical infection. In that case, as remarked in RobinA h994 }. the 
natural restriction on D is that 

D(y,t;Zt)=0 if y - t > 5. 



More biostatistical exa mple s of mo d els fo r D can be found in e.g. Mark and Eobiii^ 
Witteman etaP and lEobinsI dlQQSh . 



D [y,t; Zt) can be interpreted as the infinitesimal effect on the outcome of the treatment 
actually being given in the time-interval [t, t + h) (relative to baseline treatment). To be more 
precise, from the definition of D we have 



h-D{y, t; Zt) = [FyU)iz, ° ^yW|zJ iv) ' V + o (h) . 

In Figure 0] this is sketched, y in the picture is the 0.83th quantile of the distribution of 
y(*) given Zt. For h > 0, the 0.83th quantile of the distribution of given Zt is 

y + h ■ D(^y,t;Zt) + o{h). Thus, to shift from quantiles of the distribution of y^*) to the 
distribution of y(*+'^) given Zt {h > 0) is approximately the same as to just add h-D {jj, t; Zt) 
to those quantiles. For example, if FYit+h)^-^^ lies to the right of ^y(t)|^j for h > 0, then 

treatment between t and t + h increases the outcome (in distribution), and D{^-,t;Zt) is 
greater than 0. 

Consider again this interpretation of D as the infinitesimal effect of treatment given in 
[t, t + h). If the outcome of interest is survival, then D (y, t; Zt) should be zero if Zt indicates 
the patient is dead at time t. Indeed in that case FY(t+h)^z^ and should be almost 

surely the same for every h > 0, since withholding treatment after death does not change 
the survival time. Thus F~^^^f^^^-^ o Fy(t)|2^^ (y) is constant in /i for /i > and therefore 

D (y, t; Zt) = 0. However, this reasoning is not precise because of the complication of null 
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Figure 4: Illustration of the infinitesimal shift-function D 



sets. We will therefore just formally define D [y,t; Zt) to be zero if the outcome of interest is 
survival and Zt indicates the patient is dead at time t. 

It can be shown that D = if treatment does not affect the outcome of interest, as was 
conjectured in To be more precise, ()2nnih shows that, for example, D = 

if and only if for every h> and t, y(*+'^) has the same distribution as y^*) given Zt- That 
is, D = if and only if "at any time t, whatever patient characteristics are selected at that 
time (Zt), stopping 'treatment as given' at some fixed time after t would not change the 
distribution of the outcome in patients with these patient characteristics" . 



7 Mimicking counterfactual outcomes 

Define X (t) as the continuous solution to the differential equation 

X' (t) = D {X (t) ,t;Zt) (5) 

with final condition X (r) = Y, the observed outcome (see FigureE)). Then X (t) mimics y*-*-* 
in the sense that it has the same distribution as Y^^\ eve n given all pat ient-information at 
time t, Zt- This rather surprising result was conjectured in EobinsI (jl998h and we prove it in 
the current article. To prove this result we need the following consistency assumption. 

Assumption 7.1 (consistency), y*-^^ has the same distribution as Y given Z^- 

Notice that since no treatment was given after time r and the treatment process is right- 
continuous, there is no difference in treatment between Y^'^^ and Y. Under this consistency 
assumption and regularity conditions only, we prove that indeed © has a unique solution for 
every w € il, and that this solution X(t) mimics y^*^ in the sense that it has the same distri- 
bution as y^*-* given Zt- Section IHl deals with non-survival outcomes. Section^] with survival 
outcomes. Survival outcomes require a different set of assumptions, as will be explained in 
Section Twl 

Example 7.2 Survival of AIDS patients (continuation of Example \(i.l\) . Suppose that 

D{y,t;Zt) (l-e'^) l{treatedatt}- 
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Then 

fY 

X{t) = * + y e'^^{t'-°^'<=d at s}flg 

ifY >t, and X{t) =Y ift>Y. 



X{t) 




y = t 



jump time Y t t 

of Z _ 
Figure 5: An example of a solution X{t) to the differential equation X'{t) = D (^X{t),t; Zt) 

with final condition X (r) = y in case the outcome is survival time. 

Suppose now that we have a correctly specified parametric model for the infinitesimal 
shift-function D, D^. Then we can calculate the solution to 



X'^it)D^ {X^it),t;Zt) 



(6) 



with final condition Y. For the true ip, X^(t) has the same distribution as Y^^\ the coun- 
terfactual outcome with treatment stopped at t, given Zt- Thus instead of the unobservable 
y(*^'s we have the observable X^{t)^s which for the true ip mimic the yW's. Although we do 
not know the true ip, this result is the cornerstone to both estimating ip and testing. 



8 Testing and estimation 



The procedures for testing whether treatment affects the outcome of interest , as well as the 
proce dures for estimation of D which are known so- far (see iRobinsL Il992l . Il998l and Q, 120011 . 
l2004h are based on the result of the current article, that X{t) mimics K^*^ in th e sense t hat it 



has t he same distribution as Y^^') given the treatment- and covariate history Zt- Robind ( 19921 . 
1998h proposed these testing- and esti mation procedures, and conjectured that the resulting 
estimators are asymptotically normal. iLokl ( 200ll . 2004l l provided a conceptual framework 
and mathematical formalization for structural nested models in continuous time, and pro- 
vided conditions under which the resulting estimators are both consistent and asymptotically 
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n ormal. Here we give a n intuitive idea about the estimation- and testi ng procedu r es pro posed 



m 



EobinsI (Il992l . Il998h . For formal statements and proofs we refer to|^ (l2nnil . 120041 ') . 



To state the assumption of no unmeasured confounding, consider the covariate- and treat- 
ment history of a particular patient. Decisions of the doctors at time t may be based, in 
part, on recorded information on the state of the patient and treatment before t, i.e. on 
Zt~ = {Z{s) : < s < t), but not on other features predicting the outcome of the patient. In 
particular, given Zt-, changes of treatment at t should be independent of Y^^\ the outcome 
of the patient in case he or she would not have been treated after time t. This is not a 
formal statement: it includes conditioning of null events (since the probability that treatment 
changes at t may be zero for every fixed t) on null events {Zt-). For a formal statement see 



lipoid (|200ll . l2004l V 

Under no unmeasured confounding, changes of treatment at t should thus be independent 
of given Zt-. li X(t) mimics in the sense that it has the same distribution as Y^^^ 
given Zt, we expect that under no unmeasured confounding, changes of treatment at t should 
also be independent of X{t) given Zt-. 

Suppose now that there is no unmeasured confounding. To test whether treatment affects 
the outcome of interest, the above idea can be used as follows. If treatment does not affect the 
outcome of interest, D = and thus X{t) = Y. So if treatment does not affect the outcome 
of interest, changes of treatment at t should be independent of Y, given Zt-. Thus we can 
test whether treatment affects the outcome of interest by testing whether, given Zt-, Y adds 
to the prediction model for treatment changes. 

Also for estimation of the infinitesimal shift-function D we assume that there is no un- 
measured confounding. Suppose that we have a correctly specified parametric model for 
D. Then we can calculate X^{t), the solution to the differential equation with instead of 
D, and with final condition Y (see equation ©)• If ^(t) mimics y(*) in the sense that it has 
the same distribution as y^*) given Zt, X^{t) has the same distribution as y^*^ given Zt for 
the true ip. Since y*-*-* does not add to the prediction model for treatment changes given Zt-, 
tp could then be estimated by picking the tp for which, given Zt-, X^ adds the least to the 
prediction m odel for treatment changes. For details and a complete survey we refer to 

( 200il . hooi ). 



9 Mimicking counterfactual non-survival outcomes 
9.1 Introduction 

The purpose of the current article is to prove that X (t) mimics y*-*-* in the sense that X (t) has 
the same distribution as y*-*^ given the covariate- and treatment history Zt- This remarkable 
result is proved in Sectionl^lfor non-survival outcomes, and in Section^Jfor survival outcomes. 
Section l9?2l states the assumptions and the precise statement of the result, and Sections 19.31 - 
19. 141 provide the proof. 



9.2 Mimicking counterfactual non-survival outcomes: assumptions and re- 
sult 

We give precise conditions under which X{t) mimics y(*) in the sense that X{t) has the 
same distribution as Y^^^ given Zt. We start by having a second look at the definition of D, 
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equation (^: 



h=0 

Notice that D involves an uncountable number of distribution functions FYit+h)^-^^- In many 
cases conditioning on Zf means conditioning on a null-event, so that these conditional dis- 
tributions are not unique. Every single conditional distribution is almost surely unique, but 
since we use an uncountable number of them this is not sufficient. Therefore the regular- 
ity conditions below should be read as: there exists a collection of conditional distribution 
functions (see Appendix^ FY(t+h)^z^ such that all these regularity conditions are satisfied. 
These versions of FYit+h)^-^^ are chosen in the definition of D as well as everywhere else in this 
article. We only consider /i > 0, so the derivative with respect to /i at /i = is always the 
right-hand derivative. As we will show in Section [9.31 all assumptions can be relaxed to h in 
a neighbourhood [0, 6] of 0, provided that this neighbourhood does not depend on Z. 

With the support of a random variable X we will mean those x such that for every open 
set Ux containing x, P (X € Ux) > 0. Let yi and y2 be the lower- and upper limit of the 
support of the outcome of interest Y. We need them to be finite, and moreover we need 

Assumption 9.1 (support). 

a) All Fyh+h)^^ for h>0 have the same bounded support [yi,y2]- 

h) All FY(t+h)^^ (y) fork > have a continuous non-zero density fY(t+h)^^ (y) ony ^ [^1,2/2]- 

c) There exists an e > such that fY(t)\Zt iv) ^ ^ /^^^ '^^^ V ^ [2/15^2]; w G and t G [0,r]. 

The support condition may be restrictive for certain applications. Nevertheless, most real- 
life situations can be approximated this way, since yi and y2 are unrestriced and e > is 
unrestricted, too. Although the support condition may well be stronger than necessary, it 
simplifies the analysis considerably and, for that reason, it is adopted here. 

The rest of the regularity conditions are smoothness conditions. They allow for non- 
smoothness where the covariate- and treatment process Z jumps. This is important since if 
the covariate- and treatment process Z jumps this can lead to a different prognosis for the 
patient and thus to non-smoothness of the functions concerned. 

Assumption 9.2 (continuous derivatives). For lo E fixed, 

a) FY(t+h)\z^ {y) is in {h, y) for y £ [yi, ^2] and h>0. 

h) If Z does not jump in (ti,t2) then both ^|;j_o-^y(t+'i)|Zt (2^) ^'^'^ ^-^yW|Zt ^'"^ contin- 
uous in {y,t) on [2/1,2/2] x [^11^2) and can be continuously extended to [2/1,2/2] x [ii)i2]- 

Assumption 9.3 (bounded derivatives). 

a) There exists a constant Ci such that for all to ^l, t, h > and y 

d 

b) There exists a constant C2 such that for all uj Q, t, h > and y 



d 

-Q^FYit+h)\z^ (2/) 



< C2. 
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Assumption 9.4 (Lipschitz continuity). 

a) There exists a constant Li such that for all uj ^ Q and t and y,z & [yi,y2] 



d d 



<Li\y 



h) There exists a constant L2 such that for all uj ^ Q and t and y,z [yi,y2] 
d 



dh 



/i=0 



h=0 



< L2 \y 



The theorem below states that under these regularity conditions, D [y, t; Zt) exists. More- 
over, there exists a unique continuous solution X{t) to the differential equation with D with 
final condition X [t] =Y of equation ©, 

X' (t) = D {X (t) ,t;Zt) 

with final condition X (r) = Y. Furthermore, if also AssumDtion l7.il (consistency) holds, this 
solution has the same distribution as Y^^^ given Zt- 

Theorem 9.5 Suppose that Regularity Conditions \y.l[\y.4\ are satisfied. Then D (y, t; Zt) 
exists. Furthermore for every € there exists exactly one continuous solution X{t) to 
X'{t) = D (X(i),i; Zt) with final condition X (r) = Y. If also Assumption \7. 1\ (consistency) 
is satisfied, then this X{t) has the same distribution as Y^^^ given Zt for all t € [0,r]. 

Sometimes there will be specific times t with P (i is a jump time of Z) > 0. If there are 
no more than finitely many such times t, the above conditions (especially the differentia- 
bility conditions) can be adapted appropriately, and the conclusion that X(t) has the same 
distribution as Y^^^ given Zt remains true; see Section [9. 141 

9.2.1 Simpler regularity conditions 

We state some more restrictive but simpler conditions implying all the conditions in Sec- 
tion O 

Assumption 9.6 (regularity condition). 

• (support). 

a) There exist finite numbers yi and y2 such that all Fyit+h)^^ have the same bounded 
support [yi,y2]- 

b) All FY(t+h)\-z^ (y) have a continuous non-zero density fYi.t+h)\Zt iv) on y £ [^1,^2]- 

c) There exists an e > such that fyW^Zt (y) — ^ J^^^ y ^ [2/1 ' 2/2]; ^ ^ (^i^d 

te[o,T]. 

• (smoothness) . For every to £ 

a) {y,t,h) FY(t+h)^^ (y) is differentiable with respect to t, y and h with continuous 
derivatives on [yi,y2] x [^11^2) if Z does not jump in (ti,t2); with a continuous 
extension to [2/1,2/2] x [ti,t2] x [0, 00). 
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b) The derivatives of FY^t+h)]^^ (y) with respect to y and h are bounded by constants 
Ci and C2, respectively. 

'§y^Y^*)\zSy) ^''^^ '§h\h=o '^Y(*+'^')\Zt(y^ have derivatives with respect to y which are 
bounded by constants Li and L2, respectively. 

9.3 Outline of the proof 

Throughout the proof we use fixed versions of Fyit+h) satisfying all regularity conditions of 
SectionE21 In SectionEHlwe show existence of D. We also derive a different expression for D, 
which is often used in the rest of the proof. In Section 19.51 we show existence and uniqueness 
of solutions X{t) to the differential equation with D with final condition X (r) = Y . The 
proof that this X{t) mimics in the sense that it has the same distribution as y*^*-* given 
Zt is based on discretization. 

In Section 19.61 we therefore consider the situation where the treatment- and covariate 
process Z can be fully described by its values at finitely many f ixed times < t\ < T2 < . . . < 
tk and r. In fact this is the discrete-tir ne situation stud ied in Lok et ahl (12004), but instead 



of using the shift-function 7 described in lLok et al. I (|2004l ^ as a model we use the infinitesimal 



shift-function D here. Theorem l9.10l in Section [9. 61 states that in this discrete-time case, X (t) 
mimics y(*) in the sense that it has the same distribution as Y^^^ given the discrete-time Zt, 
under a regularity condition and Consistency Assumption 17.11 The proof of Theorem 19.101 
is relatively easy, since in the discrete-time case the continuous solution to the differential 
equation can be written down explicitly, in terms of conditional distribution functions. 

This discrete-time situation can be looked at from two different points of view. Possibly, 
the covariates and treatment can only change at the fixed times 0, ti, T2, . . . , tk and r, in which 
case Zt may represent the true covariate- and treatment history. In that case in FY(t+h)^-z^ 
the "treatment as given until time t + K^ can be completely deduced from Zt (for /i > small, 
but that suffices to determine D). However, it is also possible that Zt just represents the 
information available at or considered at time t. For modelling this distinction is important; 
for the proof of Theorem 19. 101 it is not. 

In Sections I9.7H9.1^ we consider the situation where the probability that Z jumps at t 
equals zero for all t. We prove that also in this case X {t) has the same distribution as 
y(*) given Zt, under the conditions of Section 19.21 First, in Section 19.71 we prepare the 

— in) 

discretization by constructing a series Z , containing more and more information on the 
covariate- and treatment history Z as n increases. Z^^^ depends deterministically on Z, so 
that no extra randomness is necessary to construct Z . The discretization does not change 
y^*); just the information on the treatment- and covariate process considered is less. The 
discretized process Z is a covariate- and treatment history as considered in Section 19.61 
Therefore (under conditions which have to be checked) we can define to be 



Zt 



and we define X^"^ to be the continuous solution to the differential equation 

Ax(")(t) = L>(")(^x(")(t),t;Zj"^) (8) 
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with final condition X^"'^ (r) = Y. In Section [9.81 we show existence of Z)'-"'-' and we give two 
expressions for Z)^"'^ which are often used in the rest of the proof. In Section r9.9l we show that 
the conditions of the discrete-time result are satisfied for the discretized situation, so that 
Theorem I9.1UI guarantees that there exists a continuous solution X^'^\t) to the differential 
equation Q, with final condition X^"'^ (r) = Y and with the same distribution as Y^^^ given 

We then prove that (t) converges almost surely to X{t) as n tends to infinity, using 
a result from differential equation theory which bounds the difference between solutions to 
differential equations (Sections l9.1flM9?T2|) . The proof is concluded in Section r9.13[ where 
we show that X{t) has the same distribution as y^*^ given Zt because X^"'\t) has the same 
distribution as y^*^ given andXW(t) converges almost surely to X(t). 

The proof can easily be adapted to the situation where the smoothness conditions only 
hold for /i in a neighbourhood [0, 5] of 0, for some fixed (5 > 0, by starting the discretization 
with n satisfying e.g. r/(2") < 5/2 instead of with n = 1. In Section [9. 141 we indicate how the 
proof can be adapted to include situations where the probability that Z jumps is zero except 
for at finitely many times. 

9.4 Existence of and a different expression for D 

The lemma below can be used to prove existence of D and to find a useful formula for D (and 
later two useful formulas for D^") in Section [9. 6|) : 

Lemma 9.7 Suppose that Fh is a family of non- decreasing functions and Fh (y) is differen- 
tiable with respect to y and h in a neighbourhood Uh^^y^ of {hQ,yo), with derivatives which are 
continuous in {h,y). If also F^^ (yo) is non-zero then Fh is invertihle in a neighbourhood of 
iho,yo)- Moreover {-§jiF^^^ (F^ (y)) exists and satisfies 

§jlFh{y) + Fi{y).[^F^') (F, (y)) = 
in a neighbourhood of {hQ^yo). 
Proof. Define (j) : Uho,yo — > as 

(P{h,y) = {h,Fh{y)). 

The total derivative of (p is 

ii-Fhiy) F^y))' 

so (/> € (C/ft,Q^y(,,M^). Since (t/q) is non-zero, Dcj) {hQ,yQ) has full rank. Thus the Local 
Inverse Function Theorem implies that there exists an open neighbourhood Vh^^yQ of (/loiyo) 
such that W = 4>{Vho,y(j) is open and ^ : Vh^^y^^ W is a C^-diffeomorphism. Hence 
exists and is C^. 

Notice that cl)~^ {h, x) must have the form {h, y) with y satisfying Fh (y) = x. For {h, x) G 
W such y is unique, since all Fh are non-decreasing by assumption and (y) is non-zero on 
Vhg,yg- Thus Ff^^ (x) is well-defined on W, and we have 

rHh,x) = {h,F^'{x)). 
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Both (j) and </> ^ are C^, so we can apply the chain rule to calculate 

= D{ct)ct>-^){h,x) 



1 
1 



(Z?0) {r'{h,x)) ■ {Dr')ih,x) 

1 \ / 1 

with y = F^^ {x). The lemma follows by comparing the bottom left entries of the matrices 
on the left- and right-hand side of this equation. □ 

We want to use Lemma 19.71 to prove that D exists and to find a nice expression for D, but 
D was defined as a right-hand derivative with respect to h. Hence we will use the corollary 
below, which only requires differentiability etcetera for /i > 0. 

Corollary 9.8 Suppose that Fh is a family of non- decreasing functions. Suppose that there 
exists a neighbourhood Uo^y^ of {0,yo) so that F^ (y) is differentiahle with respect to y and h 
on Uo^yg f] {h >0}. For h = we here mean the right-hand derivative. Suppose furthermore 
that these derivatives are continuous in {h,y). If also Fq (yo) is non-zero then there exists a 
neigbourhood Vb,i/o ^/ ^o) such that on the restriction of this neighbourhood to h> 0, F^ is 
invertible. Moreover ^Fh (y)) exists and satisfies 



|;F.fa) + Farf.(|;F-)(F.fe)) = 0. 



Proof. Define an extension of F to 

Uo,yo = {{y, h) :h>0 and (y, h) £ Uo-y^} U {{y, h) : h < and {y, -h) G f/o.j/o} 
an open neighbourhoud of (0, yo)) in the following way: 



Fh{y) 



Fh (y) if /i > 

2F^{y)-F^h{y) if/i<0. 



We wish to apply Lemma 19.71 on F, so we have to show that F satisfies the conditions of 
Lemma f9.7l Clearly, F is continuous on Uo,yQ. -^Fh{y) exists and is continuous on C/o,j/o, 
since on /i < 

|;F.te) = |;(-F_,.fa))| 



which is equal to ^i^/i (y) for h = 0. Also ■§jjFh (y) exists and is continuous on Uo,yoj since 



on /i < 

(y) = 2F^ (y) - F'_h (y) , 

which is equal to i*o(y) for h = 0. For h < 0, F^ may not be non-decreasing. In the proof of 
Lemma [9. 71 we only used this to show that Ff^ (y) is not only locally but also globally unique. 
This is not necessary here for h < anyway: local uniqueness suffices for the differentiability 
result, and the conclusion is for h > only. Hence the corollary follows the same way as 
Lemma EZl n 
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Because of AssumDtion l9 . '2h and l9.1b . Corollarv l9.8l can be applied on (y) = FY{t+h)\^^ (y) 
with yo = y. Thus D exists and 



d 



D{y,t;Z,) = - 



h=0 



F-^ 



d_ 

dh 
a 



h=0 



-1 _ 



Y(*)|' 



m.\h=o^Y<.t+h)\zt (y) 



a 



dy-^YW\Z 



■Ay) 



(9) 



9.5 Existence and uniqueness of X{t) 

Now that we know that D exists, we show that the differential equation X'{t) = D (^X{t),t; Zt) 
with final condition X (r) = Y has a unique continuous solution. Fix lo for the rest of 
Section [9.51 Since D may be discontinuous at the jump times of the covariate- and treatment 
process Z, we consider the intervals between jumps of Z separately. It suffices to prove 
existence and uniqueness of X{t) with final condition on any interval between jumps of Z, 
because with probability one Z only jumps finitely many times. 

Hence suppose that Z does not jump in (ii,t2) and that ti is either a jump time of Z 
or and that t2 is either a jump time of Z or r. From Q we conclude that D (y, Zt) 
is continuous on [yi,y2] x [^1,^2) because of Assumptions 19.2b and 19. Ih . The differential 
equation has a final condition at the upper end of the interval [ti, ^2)- Therefore we define D 
on [yi,y2] x [ti,t2] as 

nf„ = / ^(^'*;^*) _ '^^i^lhM) 
^ \ \imt^t,D{y,t-Zt) ift = t2. 

This limit exists because of Assumption 19.1b and the extension-assumption in Assump- 
tion IHUb. It makes D continuous on [2/1,^2] x [^15*2]- When calculating the continuous 
solution to X'{t) = D (^X{t),t; Zt) on [ti,t2]) one means to use D on [ti,t2] if D jumps at t2- 
To prove existence and uniqueness of X on [ti, ^2]) we apply Corollarv lB.4l to the differential 
equation with D. We check the conditions Corollarv IB. 41 for D. Continuity of D was shown 
in the previous paragraph. F~^^^f^^^— °FY(^)\Zt (^/i) — ^'^^ ^ because of Assumption 19. lb 
and b, so that D(^yi,t;Zt) = 0. Similarly, D(^y2,t;Zt) = 0. To show that holds, 
notice that global Lipschitz continuity of D in y on [2/1,^2] x [^1,^2) with Lipschitz constant 
C = L2/ £ + L1C2/ follows from equation @ with Lemma FC. II and Assumptions 19.41 19.3b 
and l9.1b . This same constant works on [1/1,2/2] x [ti, t2] by continuity. Therefore Corollarv IB. 41 
implies that there exists a unique solution to the differential equation with D and that this 
solution stays in [yi , 2/2] • 



9.6 Mimicking counterfactual outcomes: discrete time 

In this section we consider the situation where Z, the available information on the treatment- 
and covariate process, can be fully described by its values at finitely many fixed points = 
< Ti < T2 < ■ ■ ■ < tk < ta'+i = r. At these time points, Z may jump with probability 
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greater than zero. We prove that in this situation, X{t) mimics Y^^^ in the sense that it has 
the same distribution as Y^^^ given Zt- 

We assume that there exist conditional distribution functions (see AppendixEJ PyMlZr 
satisfying the following regularity condition: 

Assumption 9.9 (smoothness). Suppose that for k = 0, . . . , K and t € [r^, Tf^+i] there exist 
conditional distribution functions Fyit)^-^^ such that 

a) For all t € [t^jT^ -^y(t)|-^ (y) is continuous in y. 

h) For all t € [Tk,Tj^+i\, the support of Fyit)!-^ (y) is an interval. 

c) For X E [0, 1] fixed, F~^^-^^-^ (x) is differentiable with respect to t on [rfc,rfc+i]. 



Throughout Section ing we use fixed versions of -Py(t)|2^^ (y) satisfying AssumDtion l9.9l Since 

Zt contains the same information as Z-,-^ for t € [rfc,rfc+i), we can and will choose the same 
versions when conditioning on Zt- 

Theorem l9. lOl below states that under AssumDtion l9.91 D (y, t; Zt) exists for all t. Further- 
more, under Assumptions FOU and ITTTI (consistencv) there exists a continuous solution X{t) 
to the differential equation with D with final condition X (r) = Y which has the same distri- 
bution as y^*^ given Zt- Theorem 19.101 does not state that there is a unique solution to the 
differential equation with D. Sufficient conditions for uniqueness of solutions to differential 
equations can be found in e.g. Appendix IbI 

Theorem 9.10 Suppose that the treatment- and covariate process Z can be fully described 
by its values at finitely many fixed points = tq < ti < T2 < ■ ■ ■ < tk < tk+i = t, and 
suppose also that Smoothness Assumption 19. fA is satisfied. Then D [y,t; Zt) exists for all t. 
Furthermore if also Assumption ]?. 1\ (consistency) is satisfied, then there exists a continuous 
solution X(t) to X'{t) = D (^X{t),t; Zt) with final condition X (r) = Y for which X (t) has 
the same distribution as Y^^^ given Zt- 



Proof. For t G [Tk,Tk+i] 

h=0 



D{y,t;Zt) = I 



d_ 

dh 



h=0 



so existence of D [y,t; Zt) on each interval [t^jT^+i) follows from Assumption 19.9b . 

Under Assumption 19 .91 we can explicitly write down a solution to the differential equation 
X'{t) = D{X{t),t;Zt) with final condition X (t) = Y, as follows. X (r) = Y, and for 
i G [rk,rk+i) {k = 0, . . . , K - 1), 



oF^l o Fy,.,^^^jY) . (10) 
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This X{t) is well-defined because of Assumption 19.91 a and b. First we sfiow that it is indeed 
a continuous solution to X'{t) = D (^X(t),t; Zt) with X (r) = Y . Next we show that for this 
X (t), X (t) has the same distribution as Y^^^^ given Zt- 

Continuity of (fTU]) on [t^jT^+i) is clear from Assumption 19.9b . Moreover, 



because of Assumption 19.9b . which is equal to X (Tfc_(_i) because of Assumption 19.9b . Thus 
X{t) is also continuous from the left at t = Tk+i- 

For t € [TfcjTfe+i), ((TU)) satisfies the differential equation: 



dh 
d_ 
'dh 

= D{X{t),t;Zt), 

where in the second line we use that conditioning on Zt is the same as conditioning on Zr^, 
so that FY(*')\Zt ° ^yW\z identity because of Assumption 19.91 a and b. Thus indeed X 

as defined in (|T?H) is a continuous solution to X'{t) = D [X{t),t; Zt) with X (r) = Y. 

We prove that X{t) as defined in equation H1U|) has the same distribution as Y^^^ given 
Zt by induction, starting at t = r, then t £ [tk,t), etcetera. For t = t, X (r) = Y, so that 
X{t) has the same distribution as Y^'^^ given Zr because of Assumption 17.11 (consistencv). 
For the induction step, suppose that for t £ [Tfc,r] (for k = K + 1 read t = r), X{t) has the 
same distribution as Y^^^ given Zt- We need to show that also for t £ [rfc_i,rfc), X{t) has the 
same distribution as y^*^ given Zt- Because of the induction hypothesis, X (r^) has the same 
distribution as y('^'=) given Z^-^. Therefore, by the tower property for conditioning, 

P{Xin)<x\Z^,_,) = E[P{X{Tk)<x\Zr,)\Zr,_,] 

= ii;[p(y(-'=)<x|z.J|z.,_J 

= < x\Zr,_,) a.s., 

so X (Tk) also has the same distribution as Y^'^''^ given ^r^-i- Therefore Lemma IA.8I in 
the appendix and Assumption 19.9b imply that F^i^^^) -^ (X(rfc)) is uniformly distributed 

on [0,1] given Zr^_^. Then Lemma lA.91 in the appendix implies that X{t) = F~\ — o 

F — {X (rfc)) has distribution function i^y(t)|^^ given so also given Zt- That 

finishes the induction step, so that indeed X{t) ~ y(*) given Zt for all t € [0, r]. □ 

9.7 Discretization and choices of conditional distributions 

We return to the continuous-time setting and define a discretization of the covariate- and 
treatment process Z, on which we will later apply the result of the previous section. We also 
choose versions of the conditional distribution functions given this discretized process. 
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For n fixed define Tq"^ = 0, r^^"^ = ^t, T2^^ = ^t, r2"'' = = t. Consider the grid 
at stage n consisting of these points. Note that the interval [0, r] is split up into 2" intervals 
of equal length, and when n increases points are added in the middle of these intervals. For 
ease of notation we drop the superscript ^"^ in r^"^ if it is clear which n is meant. We define 

= (z(Tf"^) : < rf"^ < if Z takes values in a discrete space, 
^1"'' = (^(^i""*)) : < r^"^ < G if Z takes values in M and 

^1"^ f i+i) f^(^i"^) •) -.0 <tI"^ <t,i = l,...,k) if Z takes values in 



Notice that the information about Zt contained in Z^ increases with n: once a grid point 
is added it stays on the grid for n larger, and the information about Z in a fixed grid point 

— (n) — 

also increases with n. Notice also that Z^ depends deterministically on Zt, so that no 

— (n) — (n) 

extra randomness is necessary to construct Z^ . Thus Z^ has the properties promised in 
Section 19.31 the outline to the proof of Section |H1 

— (n) 

Next we choose versions of conditional distributions. The random variable Z^^ takes val- 
ues in a discrete space. Hence Theorem lA. 31 implies that there exists a conditional distribution 
P („) . Because of the tower property for conditional expectations 



p < = e[p{Y^'+^^ <y\Zr,)\Z 

= E 



in) 



This is a conditional distribution function: it is non-decreasing in y since all FY(t+h)\^ (y) 
are non-decreasing because they are conditional distribution functions, and because of 
Lebesgue's Dominated Convergence Theorem the limit for y — > — 00 equals and the limit 
for y — > 00 equals 1. Hence the following choices can be made: 

Notation 9.11 At this point we choose conditional distributions P— .^(n) • Also we choose 



^yi^^zi-^ (y) = / ^y«iz.,=. (y) dp-^^^\zi"^ (^) (11) 



to be the version of the conditional distribution function of Y^*^ given Z^^^ which is used in 
the rest of the proof. If s & (rfc,rfc+i) we take the same version for F^^^^^—(n); this is possible 

since for s e [Tk, Tk+i), Z^ = Z^^ . 

Notice that has been constructed with values in a discrete space. This will assure 
that the two different expressions for D^") in Section 19.81 are equal except for at a null set 
which does not depend on y and t (for a proof see Section 19. 8() . One of these expressions 
for is used to prove smoothness of D^"'\ which we need in Section l9.1fll to bound the 
difference between and X(t) in terms of D^") and D. The other expression is used in 

Section f9. Ill to prove that converges to D, so that this bound converges to zero. 
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9.8 Existence of and two expressions for D^"^ 

We prove existence of D*-"^ as defined in equation Q. Moreover, we derive two useful formula's 
for L>("), which are both similar to equation Q for D. 

First we show existence of D^"). Fix n and t, and choose r^"^ such that t G 



As before, we drop the superscript in rf"^ for ease of notation. Define 



k ' 'k+1 



Fhiy) = Fy(,+^)|^(n) (y) 



(recall from Notation 19.111 that we chose this version of conditional distribution) . In view of 
the definition © of 

d 



D 



(n) 



y,t,z 



(n) 



dh 



h=0 



Y(.t+h)\z): 



<n) ° ^y(t)|:z("' 



we wish to apply Lemma FTTl f for t € {Tk,Tk+i)) and Corollarv 19.81 ffor t = Tfc) on Fh (y), in 
(^o,yo) = (0,y). 

We check the conditions. Every FY{t+h)\-z =^(y) is non-decreasing in y, since these are 

is a probability measure, Fh{y) is also 



conditional distribution functions. Since P- 



non-decreasing. Next we show that F^ (y) is differentiable with respect to y with derivative 
/ J: -^y (*+'') iz =ziy)dP-y {z)- For uj fixed, P- -(„) is a probability measure on Z^-^^. 

Therefore we can hope to apply Theorem lD.Sl in the appendix with ^ = P— — („) and {z) = 
^fy exists for every y. Because of Assumption 19.3b . 

since // is a probability measure. Also fy is /U-integrable for every y, since fy is bounded by 
1. Thus indeed Theorem ID. 31 can be applied, and we conclude that Fh (y) is differentiable 
with respect to y with derivative J -§zFy^*+'^)\z 



(y). We check the conditions of Theorem ID. 31 Because of Assumption 19.2b . 



^ Ci, which is /i-integrable 



The same way (but 

with Assumption 19.3b instead of 19.3b ') we find that F^ (y) is differentiable with respect to 



{y)dP („, (z) 



h with derivative f -^Fy 



{y)dP-, 



(z). That these derivatives of Fh{y) with 



respect to y and h are continuous in {y, h) follows from Lebesgue's Dominated Convergence 
Theorem applied on the expressions we just derived (the conditions are satisfied because of 
Assumptions 19.2b and 19.3(1 . Furthermore, F^^y) = J -§;:PyW\z =2(^)^-^-7 (-^) non- 

zero since ^-PV(*)|Zt- =z (u) greater than for every uj (Assumption 19.1b ). 

Thus the conditions of Lemma 19.71 (for t G (rfc,r]) and Corollarv 19.81 (for t = r^) are 
satisfied for Fh{y), and we conclude Jj-^J,^,^, („) (y) exists, and that D^"^ (y,t; Z^""^^ exists 
and satisfies 

d 



y(t+h)|zi.' 



dh 



F 



-1 



Y(t+h)\z 



(n) 



h=0 

m\h=o -f ^Yi^^+i^^z 



yw\z: 



(n) 



(y) 



{y)dP- ,-,„)(z) 



I 



m\h=o^Yi^+'')\z^, 



(y) dP. 



7 l-^C") 



(^) 



{y)dP- ,-(„)(z) 



(^) 



(12) 
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where in the second hne the former paragraph is used once again. This expression for Z)(") 
will be used in Sections 19.91 and 19. 101 to prove smoothness of D^") in {y,t), which we need in 
order to use results about differential equations. 

We proceed with a second expression for . We show that there exists an 0' C with 
probability one such that 



E 


' a 

M 




E 







Muo € Vl' Vy Vt Vn. 



(13) 



First we choose this f]', in such a way that on Vl' conditional probabilities given Z^^ are 
unique, for all n and r^. 

Fix n and for a moment. We know from general theory about conditioning that 

z can be written measurable function of z. We 



— (n) 

conditional probabilities given Z 



also know that conditional probabilities given Z 



in) 



z are almost surely unique. Combining 



these two facts we see that conditional probabilities given Z^^ 
w's for which Z^^ {uj) has probability zero, that is except for w's in 



z are unique except for at 



u 



CO en-. z^;f> {u) 



Since, by construction, Z^^ takes only countably many values, this is a countable union of 
null sets and thus a null set. We define 



n 



' = ^\UU u {-^ 



n : Zt;;) (u;) 



(14) 



This set has probability one since its complement is a countable union of null sets: N is 
countable and for each n there are only finitely many r^. On this 17' conditional probabilities 
given are unique, for all n and Tk- 

We show now that ((T^ holds for as defined in equation dJ). Since takes values 
in a discrete space. Theorem IA.3I in the appendix implies that there exists a conditional 
distribution P— -^(„). For t > r^, because of the tower property for conditional expectations. 



E 
E 



Tk 



y(t+h)|' 



Zt=z i^n) {z) a.s 



(15) 



On $7' this version is the same as the one used in the definition of D^'^\ equation ((7j), since 

conditional probabilities given Z^^ are unique on Vt' . In view of this we wish to apply 
Corollary 19.81 on 

Fh (y) = j Fy(t+h)\z,=., (y) ^Pz,\z["^ ' 
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with yo = y- Verifying the conditions of Corollary 19.81 can be done in exactly the same way 
as for the first expression for Z)(") . Therefore Corollary 19.81 implies that for w € fi' and 

t G [Tk,Tk+l), 



d_ 
dh 



(n) ° ^yW\Z 



(y) 



d_ 

dh 



Ii;FYim=z(y)dPz^\zi-^ (z) 



J ai, I L ^ -Ti 





' a 

dh 


h=o^y(*+'')|Zt ^y) 


E 







where in the second line the former paragraph is used again. Equation follows. This 
second expression for D^") will be used in Section f9. Ill to prove convergence Z)(") to D. 



9.9 Applying the discrete-time result 

We prove the following lemma: 

Lemma 9.12 Suppose that Regularity Conditions \9.1\^^^\ and Assumption \7. 1\ (consistency) 
are satisfied. Then for every n there exists a continuous solution X'^'^\t) to the differential 
equation in the discretized setting, 



dt 



with final condition X^"''> (r) = Y . X^^^ (t) is almost surely unique. Furthermore, X^"'^ (t) has 



the same conditional distribution as Y^^^ given Z^^K 



Proof. Fix n. As before we drop the superscript '^"'^ for rj^ for notational simplicity. 
We first show that there exists a continuous solution X^") for which X^") (t) has the same 



conditional distribution as y*-*-* given 



(n) 



that the conditional distributions F^^^^^^—(n) of y*-*-* given z;.^ 
Assumptions 19.91 and Tl~T\ The versions chosen in Notation 19 . 1 1 1 are 



using Theorem I9.1UI We thus need to check 
Z chosen in Notation 19.111 satisfy 



F 



(y) 



In the third paragraph of Section EHl we showed that F^^^j — {„) (y) is strictly increasing and 



differentiable with respect to y on [yi,y2], which accounts for Assumption 19.91 a and b. In the 
last paragraph of Section l^^ we concluded that for x G [0, 1] fixed, F~L — (x) is differentiable 

with respect to t on [t^jt], which accounts for Assumption 19.91 c. That y^^^ has the same 



Mimicking counterfactual outcomes 



26 



conditional distribution as Y given follows from Assumption 17. II and the tower property 
for conditioning. Hence Theorem 19.101 eruarentees existence of a continuous solution X^"'^ to 
= with final condition X(")(r) = Y and with ~ y(*) given 

Theorem 19. 101 does not imply that X'"' is unique. Almost sure uniqueness of follows 
with Corollarv IB. 41 in Appendix^ in the same way as uniqueness of X (see Section l9.5() . 
but using equations (fT3|) and (fT2]) for D^^^ instead of equation © for D, as follows. We 
check the conditions of Corollarv IB. 41 for D^'^K Fix n and suppose that t G [Tfc,r,fc+i) (the 
superscript for is dropped for notational simplicity). First we prove continuity of Z)(") 
on [yi,y2] x [rfc,Tfc+i) with a continuous extension to [yi,y2] x [7"fc, 7"A;+i]) using equation (fT^ . 
Expression (|12() for Z)(") has an obvious extension to [tj^, r^+i]. We prove that this 



is continuous on [^1,2/2] x [T-fe,rfc+i]. To show that / -^\h=o ^Y(t+h)\z^ =z^y)dPy .^M {z) 
and / '§zPyW\z =z(y)dP'-y i-^t") (■^) ^'^^ continuous in {y,t) we use Lebesgue's Dominated 
Convergence Theorem. 



d_ 

dh 



d 

PY(^+h)\z^ =z (y) 

h=0 ^'^ 



PY(-k+h)iz =z (y) 



h=t-Tk ' 



and 

5 „ . . d 



are continuous in (y, t) because of Assumption 19.2b ,. Both these derivatives are bounded 
because of Assumption 19.31 Therefore Lebesgue's Dominated Convergence Theorem implies 
that the integrals of these derivatives with respect to the measure ^ = P— —(n) are continuous 

in {y,t). Because of Assumption 19.1b the denominator of i)^"') is non-zero for y £ [yi,y2], so 
that indeed D^"^^ is continuous in {y,t) on [2/1,2/2] x [rfc,Tfc+i]. 

Next we show that is Lipschitz continuous in y on [2/1,2/2] x [Tfc,rfc_(_i] with Lipschitz 
constant L2/ £+C2Li/ for all uj G il', with the set of probability one which was introduced 
in Section 19.81 equation (|14j) . Expression (|lc{l) for on fi' has an obvious extension to 
[rfe,rfc+i]. We prove that this 



(n) 





" a 

dh 


h=o^Yi^+h)\zt (y) _ 









is Lipschitz continuous in y on [2/1,2/2] x [r^, r^+i] with Lipschitz constant L2/e + C2Li/£'^ . We 
check the conditions of Lemma IC.ll in Appendix [O The nominator is Lipschitz continuous 
in y with Lipschitz constant L2 because of Assumption 19.4b . The denominator is Lipschitz 
continuous in y with Lipschitz constant Li because of Assumption 19.4b .. The nominator is 
bounded by C2 because Assumption 19.3b . and the denominator is bounded from below by e 
because of Assumption 19. lb . Hence indeed Lemma l( ] . 1 1 implies that the extension of D^") to 
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[r^jTyfe+i] is Lipschitz continuous in y with Lipschitz constant L2/e + C2Li/e'^ on Q.' . Because 
of Assumption 19.1b . the denominator is bounded away from for y G [2/1,^2]) and because 
of Assumption 19.1b ,. the nominator is equal to zero for y = yi and for y = y2- Hence, on 
fi', D^'"'\yi,t) = D^"'\y2,t) = 0. Therefore Corollary IB. 41 imphes that, on fl', there exists a 
unique solution to the differential equation with Z)(") on [rjt,Tfc+i], and this solution stays in 
[yii 2/2] • Since for n fixed there are only finitely many r^, the same is true on [0, r]. □ 

Notice that ~ given implies that for every t, X^^\t) G [2/1,2/2] almost surely. 

In fact, the proof of Theorem I9.1UI combined with the proof of the uniqueness statement of 
Lemma 19.121 leads to a stronger statement: X^"'\t) G [yi,y2] for every t for every u G 0', 
with Q' the set of probability one which was defined in equation (|14|) . 

9.10 Bounding the difference between X and X^'^^ in terms of D and D^"^^ 

In this section we bound the difference between X and X^") in terms of D and Z)(") . To do 
this we apply Corollarv IB. 41 on z = (t) and y = X{t). Since we need that both D and 
D^") are continuous, we apply Corollary IB. 41 on the intervals between the jumps of Z and 
the grid points t^'^''. Fix n and restrict u; to u; G with fl' the set of probability one as 
defined in equation H14|) . so that the expression for D^^^ of equation (|13() can be used. To 

focus attention on the differential equations, the Zts and zf^'''s, in D and D^") are skipped 
below. 

Suppose that (ti, t2) is such an interval including no jumps of Z and no grid points at stage 
n. We check the conditions of Corollarv IB. 41 for z = X^") (t) and y = X (t). In Section [9. 51 we 
already saw that D : [2/1,2/2] x [^1,^2) — > ^ has a continuous extension D : [2/1,2/2] x [^1,^2] ^ 
which satisfies the conditions of Corollarv IB. 41 with C the constant function L2/ £ + C2L1/ , 
and in the proof of Lemma 19.121 in Section [9.91 we saw that on Q.' the same is true for D*^"). 
Therefore Corollarv IB. 41 implies that for t G [^1,^2], with C = L2/e + C2Li/e'^ as above, 

< e^*'"^'^* (t2) -X(t2)| 

= e^-(*2-t) {t2)-X{t2)\ 

+ e^ (^-*) (s) , s) - Z)W (s) , s) I ds. (16) 

If Z does not jump in [(1 — 1/2"') r,T] we can apply (fT^ on [(1 — 1/2") r,r], and since 
(r) = X (r) = y we find that 

|x(")(t) -X(t)| < |D(x(")(s),s) -Z)(")(x(")(s),s)| ds. (17) 

on [(1 — 1/2") T, r]. If Z does not jump after (1 — 2/2")r we can also apply (fT?)|) on 
[(1 - 2/2") r, (1 - 1/2") r], and using equation (dTj) for t = (1 - 1/2") r we find that equa- 
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tion (dZI) also holds on [(1 - 2/2") r, (1 - 1/2") r]: 

|x(")(t) -X(t)| 



+ 



eC-is^t) (s) , s) - (s) ,s)\ds 

L>(X(") (s) , s) - (X(") (s) , s) I ds. 



If Z does not jump in ((1 — m/2") r, r] and t G ((1 — 771/2") r, r] then, with the same reason- 
ing, equation ((T7j) holds on t G ((1 — 7n,/2") r, r]. 

Suppose now that Z jumps in ((1 — (777 + l)/2") r, (1 — 777/2") r]. Then this interval can 
be split up into the part before and the part after the jump, so that, again with the same 
reasoning as before and since both X*^"^ and X are continuous in t, we still have that 

|x(")(t) -X(t)| < |I)(X(") (s),s) -£>(") (X(") {s),s)\ ds. (18) 

With probability one there are at most finitely many jump times of Z, so that (|18() holds 
almost surely for all t, and even 



sup 

tg[0,r 



|x(")(t) -X(t)| < sup 

te[o,T] Jt 



e^(^"*)|Z)(x(") (s),s) -I)(")(X(") (s),s)|ds 
e^-^|D(x(") (s) , s) - D(") (xW (s) , s) \ds a.s. (19) 



9.11 Convergence of D(") to D. 

We prove that Z?*^") (y, t; converges almost surely to D(i/,t;Zt), for fixed (y,t) G 

[2/1,2/2] X [0, t]. From equations Q and (|13|) we know that 

D{y,t;Zt)- 



and 



E 



9 p 



(y)l^ 



(n) 



a.s. 



We can apply Levy's Upward Theorem (see e.g. Williamj . Il991 , page 134) on the denom- 
inator and the nominator, since both ^ j^^^g l^t '§j]-^Y<.^'>\Zt (y") ^^'^ bounded 
(Assumption 19.3^1 . Levy's Upward Theorem leads to 



E 



d 



dh 



E 



and 



h=0 



E 



^(n) 



E 



d 



dh 



h=0 



Y(t+h)\z, 



(y) 



a 



^ F 



(y) 



E 



„ / I lOO ryin) 



a.s. 



a.s. (20) 
(21) 
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as n — > oo. Replacing the conditioning on (t(uJ^^z["'') by conditioning on Zt in H2U|) and 
(|21|) is allowed because of Lemma lA. Ill in Appendix Since moreover the denominators are 
bounded away from (Assumption 19 . lb ) . the Continuous Mapping Theorem implies that, for 
fixed {y,t) G [1/1,1/2] x [0,r], 

^Z?(2/,t;Zt) a.s. (22) 
9.12 converges to X{t) and X{t) is measurable 

We show that converges almost surely to X{t) and that X{t) is measurable. The 

bound 

< r e^-^ I (s),s) (X^*^) (s),s) Ids a.s. 



sup 

te[o,T]' ■ Jo 

of equation p9|) and almost sure convergence of -D(")(y,t) to Z)(y,t) for fixed (see 

equation l\'22\i ) are the starting point here. 

First we prove that the integrand in the above bound converges almost surely to 0, by 
proving that for s fixed, (X^*^) (s) ,s) - D (X^ (s) , s) converges almost surely to 0. 
Recall that in Section [9. 51 we saw that D : [1/1,1/2] x [^15^2) ^ has a continuous extension D : 
[2/1 ) 2/2] X [ill ^2] ^ ^ which is Lipschitz continuous in y with Lipschitz constant L2/e+C2Li/e'^ . 
Recall also that in the proof of Lemma |9.12l in Section 19.91 we saw that on 17', the set of 
probability one introduced in equation H14() for which we have expression (Unj for L>("), the 
same is true for D^'^\ Therefore the pointwise almost sure convergence of Z)(") to D of 
equation H22|) implies, with Corollarv ID. 21 in Appendix^ that for fixed s indeed 

|2)W(x(")(s),s) -^(^("^(s),^)! ^0 a.s. (23) 

To show that this implies that the bound converges almost surely to 0, define 

A=\^{s,uj)£ [0,r] xQ: |i?W(x(") (s) , s) - {s),s) \ ^ o} , 

with As its section at s and A^^ its section at uj. Then 

As = \^ioen: (X(") (s) , s) - Z)(X(") (s) , s) H 0} 

has probability one because of the former paragraph. Therefore, using Fubini's Theorem, 
with A the Lebesgue- measure on [0,r], 



(AxP)(^) = / P{As)dX{s) 

J{0,t) 

= [ IdX (s) = T. 

J(0,t) 



Also by Fubini's Theorem, 

(AxP) {A) = J X{A^)dP{. 
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so that since \{Ai^) < r, X(Ai^) = r P-almost everywhere. This shows that for P-almost 
all uj, has measure r. So for P-almost ah uj, \D (s) , s) - L>(") {s),s)\ con- 

verges to for A-almost all s. Moreover, because of expression Q for D and expres- 
sion ((T^ for L>(") and assumptions 19.3b and 19.1b . e'-^'^ \D{-,s) — D^"''> {■,s)\ is bounded by 
2^"^ C2I & on fi'. Therefore for almost all uj Lebesgue's Dominated Convergence Theorem 
can be applied on the integral of e^'^ \D (X^") (s) , s) — D^") (X^") (s) , s) | with respect to A, 
/[o t] 1-^ (s) , s) — (X*^'^) (s) , s) I ds, implying that for almost all lo this integral 

converges to as n — 00. With the bound of equation ((T^ this implies that 

sup - X{t)\ a.s. (24) 

ie[0,r] 

Since the almost sure limit of a sequence of random variables is measurable if the cr-algebra 
is complete, measurability of X(t) follows immediately from measurability of the X^"'\ 



9.13 Conclusion 

We show that since ~ given zj"'' (see Section Ell) and X(")(t) X{t) a.s. (see 

Section rmi) . x{t) ~ yw given Zt- This completes the proof. 

Because of Lemma lA.lOl in Appendix^ ^{t) ~ given Zt if 

i?[/(X(t))|Zi] -i?[/(yW)|Zi] =Oa.s. 

for every bounded Lipschitz continuous function / : R ^ M. Suppose without loss of generality 
that / is bounded by 1 and has Lipschitz constant L. Then, using the triangle inequality, 



\E[f{X{t))\Zt]-E[f{Y^'))\Zt]\ < E[f{Xit))\Zt]-E f{X{t))\zl 



in) 



+ 
+ 



E[f{X^^\t))\Z 

f{x(-Ht))\z^;^ -i?[/(yW)|z 



Because of Jensen's inequality, the second term is bounded by \f {X{t)) — f (X*^") (t)) | | Z 

since / is Lipschitz continuous with Lip- 



which is bounded by E 



schitz constant L and bounded by 1. Because X^^\t) ~ y^*) given z["'\ the third term is 
equal to 



E 



/(yW)|zS") -E[f{Y(^))\Zt 



We thus find that 



\E[f{X{t))\Zt]-E[f{Y('^)\Zt]\ < \E[f{X{t))\Zt]-E[f{X{t))\Z 



t 



+E 



+ 



E 



/(yW)|zf) -i?[/(yW)lZ,] a.s.(25) 



We show that the right hand side of equation (|25() converges in probabilit v to zero. O n the 
first and the last term we can apply Levy's Upward Theorem (see e.g. Williams! . 
134), since the integrands are bounded by 1. Levy's Upward Theorem leads to 



1991 



page 



E 



f{m)\z 
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and 



E 



in) 



E 



a 



in) 



as n ^ (X). Thus, with Lemma lA.llI in Appendix ^ both the first and the last term of 
equation ()25() converge to almost surely. The second term converges to in probability 
since it is almost surely non-negative and its expectation converges to 0: 



EE 



L\X{t)-X^''\t)\A2\Z 



E (L\X{t) - x'-^^tM A 2 







because of the Lebesgue Dominated Convergence Theorem and the fact that X^"'\t) converges 
almost surely to X{t). 

Thus \E [f {X{t)) \Zt] — E [/ (y(*)) \Zt\ I is bounded by a stochastic variable which con- 
verges in probability to 0. This implies that it is almost surely equal to 0. With Lemma lA.lOl 
we conclude that indeed X{t) mimics y(*) in the sense that X{t) has the same distribution 
as y^*^ given Zi. 



9.14 Mimicking counterfactual outcomes: discrete-continuous time 

In some situations one may know that there are some specific times t with 
P [t is a jump time of Z) > 0. If there are no more than finitely many such times t, the 
conditions in Section 19.21 (especially the differentiability conditions) can be adapted ap- 
propriately so that the conclusion that X(t) has the same distribution as y^*) given Zt 
remains true. The proof can be adapted by adding the finitely many times for which 
P {t is a jump time of Z) > to the grid, for each n. Just the proof of Lemma |A.11I has 
to be adapted. But that is easy since if all t for which P [t is a jump time of Z) > are on 

the grid, then also for these t, (t(Zj"^) | a {Zt)- 



10 Mimicking counterfactual survival outcomes 
10.1 Introduction 

In this section we prove that X(t) mimics y(*) in the sense that X{t) has the same distribution 
as y(*) given the covariate- and treatment history Zt, under conditions which do not exclude 
survival. The conditions are similar to the ones in Section |^ but adapted to survival as 
the outcome of interest. The proof also follows roughly the same lines as the one for other 
outcomes, but some changes are necessary. 

If covariates and treatment were measured at time t it cannot be avoided to include in Zt 
whether or not a patient was alive at time t: what are a patient's covariates if he or she is 
dead? Therefore we include in Z{t) an indicator for whether or not a patient is alive at time 
t. Thus if a patient died at or before time t his or her survival time can be read from Zt- 

The conditions in Section |^ usually exclude survival as the outcome of interest, since if 
the outcome is survival the Support Condition 19.11 saying that all FY(t+h)\z^ have the same 

bounded support [2/1,2/2], will not hold. The reason for this is as follows. Zt includes the 
covariate-measurements and treatment until time t. Given that a patient is dead at time t 
and given his or her survival time, the distribution of this survival time cannot have the fixed 
support [2/1, 1/2]) which is independent of t. Also given that a patient is alive at time t this is 
hardly ever the case; one often expects that t is the left limit of the support. 
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As compared to Section |^ we make two extra assumptions. The first is a consistency 
assumption, stating that stopping treatment after death does not change the survival time. 
The second extra assumption states that there is no instantaneous effect of treatment at the 
time the patient died (notice that the difference between Y^'^\ the outcome with treatment 
stopped at the survival time Y, and Y is in treatment at Y). 

Assumption 10.1 (consistency). Y^^^ =Y on {uj : Y < t} U {u> : Y^^^ < t|. 

Assumption 10.2 (no instantaneous effect of treatment at the time the patient died). 
y(*) =Y on {cj : Yt} U {cj : = t} . 

These assumptions assure that treatment in the future does not cause or prevent death at 
present: 

Lemma 10.3 Under Assumptions (consistency) and (no instantaneous effect of 
treatment at the time the patient died), 

a) For all h > 0: = Y on {u : Y < t} U Uh>o {uj : Y(^+'''> < t} . 

b) For all {y,t, h) with y < t + h and h > 0: {lo : < y} = {u : Y < y}. 

Proof, a): From Assumptions II 0.11 and ITIOl we immediately have Y^^^ = Y on {oj : Y < t}U 
{u; : < t}. Thus if y < t then F^*) = Y, and moreover for all /i > 0, F < t + /i, so 
that from Assumption ITim = Y. If y(*) < t the same reasoning can be used since 

y = y(*) in that case. If for some h > 0, y(*+^) < t then also y(*+'^) < t + h, so that from 
Assumption HnUl y = y < t, and again the same reasoning can be used, 
b): For y < t + h and h > 0, {lo : y(*+^) < y} = {lo : Y^^+^^ < y} n {y(*+^) < t + h} = 
{lj -.Y < y} because of a). □ 

Henceforth we will only use versions of conditional distributions which are consistent with 
Lemma [10.31 in the sense that Fyit+h)]^^ (y) ^YlZt for all y < t + /i, /i > and lo G Q. 

Next we have a second look at the definition of the infinitesimal shift-function D, see 
equation 



h=0 

now with survival in mind as the outcome of interest. First remark that considering the 
interpretation of D [y, t; Zt) as the infinitesimal effect of a bit of treatment directly after t on 
survival, D (y, t; Zt) should be zero if Zt indicates the patient is dead at time t. Although 
in that case indeed Fyit+^^z^ FYit)^-^^ are almost surely the same for every h > 0, since 
withholding treatment after death does not change the survival time fLemma ll0.3() . F~^^^j^^^— 

will often not exist. Therefore if Zt indicates the patient is dead at time t we will just formally 
define D (y, t; Zt) to be zero. 

Next, consider y < t. Notice that considering the interpretation of D (y, t; Zt) as the 
infinitesimal effect of treatment directly after t on the survival-quantile y, D (y, t; Zt) should 
be zero for y <t since treatment at or after t should not cause or prevent death at or before 
t, so it should not affect quantiles of the survival curve before t. Indeed if Zt indicates that 
the patient is alive at time t, FY(t+h)^z^ (y) ^Y\Zt = for y < t for all /i > because of 
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Lemma ll0.3b . so D (y, t; Zt) = for y <t. However, in order to make D continuous on y > t 
in between the jump times of Z, we re-define D {t,t;Zt) = limynD {y,t\Zt). We will see 
that this limit exists under the conditions in Section 110.21 This limit need not be equal to 
0. We thus get the following minor adaptation of the definition of D in case the outcome is 
survival: 



D{y,t-Zt) 



if Zt indicates the patient is dead at t or y < t 

m I h=o (^y iz, ° ^yM \Zt ) otherwise, for y > i (26) 

lim.yit D (y, t; Zj) otherwise, for y = t. 



Notice that the area where D is possibly non-zero is (y, t) € [0, oo) x [0, min {Y, r}] : y >t. 
Therefore if y < r, the solution to the differential equation X{t) is equal to Y for t € t]. 
An example of such X(t) is shown in Figure 13 Sectional 



10.2 Mimicking counterfactual survival outcomes: assumptions and result 

We give precise conditions under which X{t) mimics in the sense that X{t) has the same 
distribution as F^*^ given Zt, for survival outcomes. We choose versions of FY{t+h)^-^^ which 
are consistent with Lemma 110.31 and which satisfy all regularity conditions below. We use 
these versions in the definition of D of equation H26|) . and everywhere else in this section. 

We already mentioned that it is not reasonable to assume that the conditional distribution 
of the survival time has the fixed support [2/1,2/2] given any covariate- and treatment history 
Zt- If a patient is alive at time t, one often expects that t is the left limit of the support. We 
assume that 

Assumption 10.4 (support). There exists a finite number y2 > t such that 

a) For all uj £ 0, and t with Y > t, all FY{t+h)^^ for h >0 have support [^,^2]- 

b) For all to £ Q and t with Y > t, all FY(t+h)^^ for h >0 have a continuous non-zero density 
fYit+'^)\Zt ony e [t + h,y2]. 

c) There exists an e > such that for all uj £ and t with Y > t, fyW^z^ iv) > ^ f'^'^ 
y e [t,y2]- 

Next we look at the smoothness conditions in Section |^ It does not seem reasonable to 
assume that Fy^t+hjfZt continuously differentiable with respect to h and y on {h,y) G 

[0,00) X [t, 1/2] since for y < t + /i, Fy(t+,i)|2^^ (y) = Fy^-^^ (y) fLemma 110.3b ). Therefore the 
derivative of Fyit+h) \-z^ [v) with respect to h is likely not to exist at y = t + h (and is equal 
to zero for y < t + /i). Also the derivative of Fy(t+h)i^^ (y) with respect to y may not exist at 
y = t + h, because of the different treatment before and after t + h. For survival outcomes we 
replace the smoothness conditions I9.2H9.4I by: 

Assumption 10.5 (continuous derivatives). For uj € fixed, 

a) IfY > t then Fy(^t+h)\-z^ {u) restricted to {{h, y) G [0, 00) x [t, y2] : y > t + h} is in {h, y). 

b) If Z does not jump in (ti,t2) and Y > ti then both -^\f^^QFy(^t+h)fZt ^'"'^ §^^YW\'Zt 
are continuous in {y,t) on {{y,t) £ [ii,y2] x [^15*2) ■ y > t} and can be continuously ex- 
tended to {{y,t) € [ti,y2] X [ti,t2] -y ^t}. 
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Assumption 10.6 (bounded derivatives). 

a) There exists a constant Ci such that for all t, h > and y > t + h, for a; G with Y > t, 

d 

—Fy(t+n)\-z, (y) < Ci. 

b) There exists a constant C2 such that for all t, h > and y > t + h, for u Q with Y > t, 

< C2. 



d 



dh 

Assumption 10.7 (Lipschitz continuity). 

a) There exists a constant Li such that for all t and y,z (t, 1/2], for to Q with Y > t, 



d d 



<Li\y-z\. 



h) There exists a constant L2 such that for all t and y,z ^ (t, y2], for € with Y > t, 



d 



dh 



F 



h=0 



d_ 
'dh 



h=0 



< L2\y- z\. 



Assumption 10.8 (smoothness). For all lu £ Q and t with Y > t, Fy^-^^iy) is continuous 



m 



Theorem 10.9 Suppose that Regularity Conditions \10.4\\10- ^ ^i"^ satisfied. Then D [y,t; Zfj 
as defined in equation \2b]) exists. Furthermore for every cj G $7 there exists exactly one 
continuous solution X{t) to X'{t) = D {^X(t),t; Zt) with final condition X (r) = Y . If also 
A ssumptions \7.f\\lU.l\ and \lU.'^ f consistency and no instantaneous treatment effect at time of 
death) are satisfied then this X{t) has the same distribution as Y^^') given Zt for all t £ [0,r]. 

10.2.1 Simpler regularity conditions 

We state some more restrictive but simpler regularity conditions implying all the regularity 
conditions in Section flO. 21 

Assumption 10.10 (regularity condition). 

• (support). There exists a finite number y2>T such that 

a) IfY > t, all Fy(t+h)|2^j for h>0 have support [t, 2/2]- 

b) IfY>t, all Fy(t+h)|2^j for h >0 have a continuous non-zero density fY{t+h)\Zt iv) 
ony £[t + h, y2] . 

c) There exists an e > such that for all lo £ Q and t with Y > t, fY{t)\Zt hj) > ^ /^^^ 

y e [t, 2/2] ■ 



(smoothness) . For every u; G 
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a) If Z does not jump in (ti,t2) andY > ti, the restriction of{y,t,h) FY(t+h)^^ (y) 
to {{y,t,h) G [ti,y2] X [ti,t2) x M>o : y > t + h} is C'^ in {y,t,h). 

h) The derivatives of Fy^t+h)^^ (y) (y > t + h) with respect to y and h are bounded by 
constants Ci and C2, respectively. 

c) S^FyMizSy) '^'^'^ m\h=o^Y(t+h)\zSy) (y > ^) ^'^^^ derivatives with respect to y 
which are bounded by constants Li and L2, respectively. 

d) For all uj ^Q. and t with Y > t, Fyi^^iy) is continuous and strictly increasing on 
its support [t,y2]. 



10.3 Outline of the proof 

The proof of Theorem 1 1 U . 91 follows the same lines as the proof of Theorem l9.5l The numbering 
of the sections here is the same as in Section 1^1 to make references and comparisons easy. We 
indicate how the proof of Theorem 19.51 can be adapted in order to prove Theorem 111). 91 We 
only include those sections which need to be adapted. 

Notice that there is one essential difference between survival outcomes and non-survival 
outcomes: if Zf indicates the patient is alive at t, X{t) should be above t, if we ever want X{t) 
to have the same distribution as Y^^^ given Zt (Y^^^ > t in that case because of Consistency 
Assumption 110. J|) . This leads to an extra problem in the proof for the continuous-time case, 
namely: how to prove that the solution stays above the line y = t for t £ [0,1^]? We 
solve this problem in Section 110.51 by showing that, under the assumptions of Section 110.21 
D{t,t;Zt) < 1. 

10.4 Existence of and a different expression for D 

If Y < t, D {y,t; Zt) = by definition (|2H|). Thus we can concentrate on E with 
Y > t. If y > t, Corollary 19.81 can be applied on F^ {y) = FYit+h)^-^^ in) with yo = y and 
^o,yo n {/i > 0} = [0, y — t) X (t, 7/2]) because of Assumptions 110.5k and ll0.4b . Thus for y > t, 
D as defined in equation exists, and it is equal to 



d 



D{y,t;Z,) = ( ^_/yU^z.) [Fyi^MzAy)) 



h=0 

dh\h=0 ^Y<.t+'^)\Zt 



^ ' Fyft + h)lZ, in) 



^^Ywizt (y) 



(27) 



D (t,t;Zt) is by definition H26() equal to the limit as y J, t of this D (^y,t;Zt), which exists 
because of Assumptions 110. lb and 110.5k . 



10.5 Existence and uniqueness of X{t) 

If Zt indicates the patient is alive at t and X{t) has the same distribution as Y^^^ given 
Zt, we should have that X(t) stays above t (Y^^^ > t in that case because of Consistency 
Assumption 110. f|) . In order to prove that X{t) stays indeed above t if the patient is alive at 
t, we prove that D (t, t; Zt) < 1. 

Lemma 10.11 Under Assumptions \lUm and \lU.S\a . ifY > t then D [t,t; Zt) is less than or 
equal to 1. 
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Proof. We start with some ideas, which we make precise below. Intuition says that D not 
only measures the increase of quantiles when treatment is prolonged but also the decrease 
of quantiles when treatment is withheld. Thus quantiles y seem to approximately move to 
y — hD {y,t; Zt) when treatment is withheld between t and t + h. If quantiles near t move 
down to t with speed greater than 1 when treatment is withheld starting from h it seems like 
these quantiles will end up below t when treatment is withheld at t. However, if treatment 
is withheld starting from t this does not cause death at or before t, so the quantiles above t 
should stay above t. This leads to a contradiction. The following makes this precise. 

Fix € and t for which Y > t. Recall from Section llfl.4l that a := limyj^ D (y, t; Zt) 
exists. We need to prove that a < 1. Suppose that a > 1. We show that this leads to a 
contradiction. 

Notice that because of the chain rule, for y > t + h, 



exists and is continuous in {h, y) for y > t + h with a continuous extension to 
{(/i, y) G [0, oo) X [t,y2\ : y >t + K} because of Assumptions llU.Sb and 111). 4b . Notice that 
for /i = this expression is equal to —D (y, t; Zt) because of expression (^7)) for D. Thus the 
limit of l|28|) for h = and y J. t is equal to —a. This can be compared with the intuitive idea 
that quantiles y approximately move to y — hD (^y, t; Zt) when treatment is withheld between 
t and t + h. 

Now choose 6 = which is greater than since we assumed that a > 1. By continuity 
of (|28|l in {h,y) there exists an open neighbourhood C^(o,t) of (0, t) such that for 

{(/i,y) G ;7(o,t) :y>t + handh>0} 

the expression (|28|1 above is not further than 5 away from —a. Thus there also exist ho > 
and yo > t such that for h G [0, /iq], y < yo and y > t + h, (|28() is not further than 5 away 
from —a. Choose hi G [0, ho] with t + {1 + 5) hi < yo, and put yi = t + {1 + 5) hi. 
Notice that since yi > t + hi, 

(informally this is about withholding treatment in the future not causing death at present, 
which we wanted to use; formally this follows e.g. from Assumption 111), jk and b). Moreover, 
for y = yi, the derivative (^5]) exists on /i G [0, hi], since for h G [0, /ii], yi = t + (1 + 6) hi > 
t + hi >t + h. Thus by Taylor expansion there exist an hi G [0, hi] with 



d 

^Yl)\zt ° ^Y(^+'^i)\z, (yi) y^ + ^^Qh 

Combining this we find that 



h=h\ 



t <yi + hi — 
oh 



h=hi 



^Yl)\Zt °^Y(^+'^)\Zt (2/1) 
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for some hi € [0, /ii]. Rewriting this leads to 

_ o (yi) <yi-t{l + 6)hi. (29) 



h=hi 



For (^5]) is not further than 6 away from —a, since hi e [0,^o]) 2/i ^ yo and yi > 

t + hi > t + hi, so that 



dh 



h=hi 



Therefore the expression on the left-hand side of (|29j) lies in the interval 
((a — 5) hi, (a + 5) /ii). Equation thus implies that (a — (5) /ii < (1 + (5) /ii, so a < 1 + 25, 
so < (5. This is in contradiction with our choice of 5, which was 5 = □ 

Fix uj for the rest of this section. Just as in Section f9. 51 it suffices to prove existence and 
uniqueness of X{t) with final condition on any interval between jumps of Z, because with 
probability one Z jumps only finitely many times. Hence suppose that Z does not jump in 
{ti,t2) and that ti is either a jump time of Z or and that t2 is either a jump time of Z or r. 

If indicates that the patient is dead at ti, D is identically on [ti, t] and X{t) exists, 
X{t) is unique, and X{t) is identically Y on [ti,r]. 

If Zt^ indicates that the patient is alive at ti, we use Corollary IB. 51 to prove existence and 
uniqueness of X{t). Notice that D [y,t; Zt) is continuous on {{y,t) € [ti,y2] x [^1,^2) '■ y > t} 
because of equation H27|l and Assumptions 1 1 U .3b and llU.4t . However, the differential equation 
with X{t) has a final condition at the upper end of the interval [ti, t2)- Just as in Section [9.51 
we define the continuous extension D of D on [ti, 2/2] x [^1,^2] '■ y ^t, which exists because of 
Assumption 110.4b and the extension-assumption in Assumption llfl. 5b . Just as in 

Section EISl 

D is Lipschitz continuous in y on {{y,t) G [ii,y2] x [^15^2) '■ y > t} with Lipschitz constant 
L2/ e + C2Li/e'^ . The same constant works on {{y,t) G [ti,2/2] x [ti,i2] : y > by continuity. 
Because of Lemma 111). Ill above. D (t,t;Zt) < 1 for all t. Thus from Corollarv IB. 51 we have 
existence and uniqueness of a continuous solution X{t) to the differential equation with D, 
with X (t) > t ii Y > t. 



10.6 Mimicking counterfactual survival: discrete time 

In this section we consider the situation where Z, the available information on the treatment- 
and covariate process, can be fully described by its values at finitely many fixed points = 
Td < Ti < T2 < . . . < Tx < Tx+i = T. At any time at which a patient's covariates are 
measured we should include whether or not a patient was alive at that time. Hence we 
assume that Zt includes whether or not a patient was alive at n , . . . , rp(^) , with Tp(j) the last 
Tk before or at time t. 

For simplicity we pose differentiability conditions and restrictions on the support of y*-*-* 
given Zrf. for t > Tk similar to the continuous-time case. Notice that if is the lower support 
limit of Fy^-^^ 1 then is also the lower support limit of FY(t)^z^^ for all t > t^: if is the 
lower support limit of F^,-^ then, because of Lemma [lU.3b . for all /i > and all 5 > 0, 
p (|y(Tfc+/i) < _ = p ({y <Tk- 5}) = 0, and, again because of Lemma ITinb . for ah 
/i > and all < 5 < /i, P ({y(^fe+'») < Tk + 6}) = P {{Y < Tk + 5}) > 0. In most cases Tk 
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will then also be the lower support limit of FY'-'^k)\'z ' unless by stopping medication at time 
Tfe the patient stays alive with probability one for a certain period of time, while if medication 
is not stopped at time the hazard of dying is non-zero immediately after time r^. For the 
same reasons as in the continuous-time case, we assume differentiability only for y > t. We 
replace Assumption 19.91 for survival outcomes by: 

Assumption 10.12 (smoothness). Suppose that there exists a y2 > t such that for k = 
0,...,K and t G [rfc,rfc+i] there exist conditional distribution functions FYW\z.r, '^^^'^^ o,re 
consistent with LemrnM \l 0. ,91 and such that if indicates that the patient is alive at time Tk-' 

a) For every t G [ta,., r^+i], -^y{t)|;z^^ has support [Tk,y2]- 

b) FY(t)\-z (y) is continuous in {y,t) on {y,t) G R x [Tk,Tk+i]. 

c) FY(t)\-z^ (y) is in {y,t) on {y,t) G [Tk,y2] x [Tk,Tk+i] : y > t with a extension to 
{y,t) G [Tk,y2] X [Tk,Tk+i] :y>t. 

d) For t G [Tk,Tk+i), ■§jjFY(t)\z^^ [y] is strictly positive on y £ [t,y2]- 

Throughout Section [10.61 we use fixed versions of FY{t)\z (y) satisfying Assumption 110.121 

Since Zt contains the same information as Z^-^. for t G [Tfc,rfc+i), we can and will choose the 
same versions when conditioning on Zt- 

In this discrete-time case Zt contains no indicator for death or alive at time t except for 
if t is one of the r^'s. However, also in this case X{t) should be above t for t < Y: for such 
Y, X(t) should not play the role of Y^^^'s less than t. The reason for this is, intuitively, that 
if < t, y^*-* = Y < t, and if also Y's greater than t would play this role there would be 
too many of them. We will show explicitly that there exists a solution X(t) with X{t) > t 
for Y > t. Hence the following analogue of Theorem 19. 101 for survival outcomes: 

Theorem 10.13 Suppose that the treatment- and covariate process Z can be fully described by 
its values at finitely many fixed points = tq < ti < T2 < . . . < tk < tk+i = t, and suppose 
also that Assumvtion MU.T^ is satisfied. Then D (y, t; Zt) exists for all t. Furthermore if also 
Assumptions \7.i\ \10.1\ and MO.A (consistency and no instantaneous treatment effect at time 
of death) are satisfied, then there exists a continuous solution X{t) to X'{t) = D Zt) 
with final condition X (t) = Y and with X{t) > t if Y > t, for which X (t) has the same 
distribution as Y^^^ given Zt- 



Proof. For t G [rfc,r/c-|.i) and y > t, Corollarv 19.81 can be applied on Ffi{y) = FYit+h)^-^^^ (y) 

with yo = y, because of Assumptions 1 1 0. c and d. Thus D (y, t; Zt) as defined in fl^i exists 
for y > t and 

D^„,Z.).-^^^ (30) 

By definition, D (t, t; Zt) is equal to limit of (|5ni) for y i t, which exists because of Assump- 
tions E^]c and d. 

Under Assumption 1 1 . 1^ we can explicitly write down a solution to the differential equa- 
tion X'{t) = D (^X (t) , t] Z t) with final condition X{t) = Y, as follows. For the moment 
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consider cj G O fixed. Define Tpj-^) as the last before the survival time Y. In the following, 
F (Tpj^j)^ will denote the distribution function Fy^T^)!-^ for which k = p{uj). It will not 

denote a distribution function conditonal on p {uj). We define X{t) as follows. For t > t^i^^^-^j^i 
we define X{t) = Y. For t < Tp(^uj)+ij i ^ [Tk,Tk+i), we define 

This X{t) is well-defined under Assumption IIU. a and b. 

We first show that if Y > t then also X{t) as defined in equation (|31() is greater than t. 
We start by considering t G ['rp(a;)i ^) H [0, t]. For such t, y > i, so 

since because of Assumption 110.1^ a. F it , ,,,)|^ is strictly increasing on its support 

[rp(^), 1/2] ) which includes both t and Y . Because of Lemma [10.31 the right-hand side of this 
expression is equal to Fyit)]^ ^ j Hence 

so that since F~^^-^ — is strictly increasing on [0, 1] (Assumption IIU. 1^ a) . 

The right-hand side is equal to t, since t G [Tp(i^),y2] and i^y(t)|2^ is strictly increasing on 

^^^^ I Xo;) 

['^p(a;)>y2] (Assumption IIU. 1^ a) . Thus indeed X(t) > t for t G [Tp(^),y). 

To show that X{t) as defined in equation (|31j) is also greater than f for other t < y we 
use induction, starting with k = p (uj) — 1 and ending with k = 0. We thus need to prove that 
if t G [Tk,Tk+i) and X (t^^i) > t^+i then X (t) > t. Thus suppose that t G [rfc,rfc_|_i) and 
that X (Tk+i) > Tk+i- Notice that X (rfc+i) > Tk>t, so 

since because of Assumption IIP. 12l a. i^v(^fc+i)i^ is strictly increasing on its support [Tk,y2\-, 
which includes both t and X{Tk+i)- Because of Lemma 110.31 the right-hand side of this 
expression is equal to Fyit)]^^ (t). Hence 

-^y(rfc+i)|2^ {X (Tfe+i)) > FY{t)fz (t) , 

I I ft 

whence, since F~^^-^^— is strictly increasing on [0, 1] (Assumption 110. 1^ a) . 

The right-hand side of this expression is equal to t, since t G [Tk-,y2\ and Fy(t)|-g^ is strictly 
increasing on [Tfc,y2] (Assumption IIO.T^ a) . We conclude that indeed X{t) > t HY > t. 
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Next we show that X{t) as defined in equation (|5T|) is a continuous solution to X'{t) = 
D (^X{t),t; Zt) with final condition X (t) = Y. We first consider t > Tpi^uj)+i- Foi^ these t, 
D (y, t; Zt) = 0, so X{t) should be equal to Y; and indeed X{t) as defined in equation (|5T|) is 
equal to Y. For t G [Y, Tp(^i^-^^i) we also have D {y,t; Zt) = 0, so X{t) should be equal to Y. 
Because of Lemma I1U.31 



SO X{t) as defined in equation is indeed equal to Y. 

To show that X{t) as defined in equation (|^T|) satisfies X'{t) = D [X{t),t; Zt) on [Tpj-^-j , Y) , 
notice that for h small, since Fy^t)]^ is continuous f Assumption IIU.T^ b) . 



>{") 



X{t + h) = o F^,.^^^^^^, {Y) 

This expression is differentiable at /i = with derivative D (^X{t),t; Zt), since, as we showed 
before, X{t) > t. Thus indeed X(t) as defined in equation H31|) satisfies X'{t) = D (^X(t),t; Zt) 
on [rp(^),y). 

We still need to prove continuity of X{t) as defined in equation 1)311) at t = y, but it is 
easier to sho w continuitv on [ 7y,r„,;> , T-n(„,^-l-1 ] , so we show continuity of X{t) on [Tp(t^) , rp(^)_|_i] . 
According to IVan der Vaart (|l998l l Lemma 21.2, Fn converges weakly to F if and only if 
Fn^(t) ~^ F~^{t) at every t where F~^ is continuous. Notice that because of Assumption llO. 1*21 
b, Fywiz converges weakly to -Fy(to)|;z * ^ [Tp(uj),Tp(uj)+i] ^ *o for any to e 

['^p(t<;)) '^p(uj)+i] • Moreover, because of Assumption ITfl. 121 a. F^^^^^^-^ is continuous on [0,1]. 
Hence F'^^^^-^ ^{x) F~l^^^^-^^{x) as t e [rp(^), rp(^)+i] to, for every x € [0,1]. Thus 
also 

p{")+i ' p(ij)+i ' p(ij)+i ' p{")+i 

as t G ['rp(a;)) '^p(a;)+i] ^ ^0- to G , Tp(^)+i) , the right-haud side of this expression is 
equal to X{to), which implies continuity of X{t) as defined in equation (|ST|) on [Tp(^uj) i Tp{uj)+i) ■ 
For to = Tp{uj)+i, the right-hand side of this expression is equal to 

which is equal to Y since Y is in the support of F (r , . , i),^ (Assumption a) and 

F^(t , is strictly increasing on its support f Assumption 110.12) a). That implies 

continuity of X(t) as defined in equation (|^T|) in rp(j^)^i. 

That also for k < ^(t) as defined in equation satisfies X'{t) = D {X{t\ t; Zt) on 
[rfc, T,fc+i) and that X{t) is continuous on [r^, r^+i] follows the same way as in the previous para- 
graph, starting from the fact that for t G [Tk,Tk+i), X{t) = F~l^. -= o Fy{t^,)\-z {X (rfc+i)). 
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Next we prove that X{t) has the same distribution as Y^^^ given Zt- For t = t, X (r) = 
Y, so that X{t) has the same distribution as Y^'^') given Zj- because of Assumption 17.11 
(consistency). For the induction step, suppose that for t G [Tfc,r] (for k = K + 1 read 
t = r), X{t) has the same distribution as Y^^^ given Zt- To show: for t G [rfc_i,Tfc), X{t) has 
distribution function Fyitj^z^ given Zt-^_-^, so also given Zt. If Z^-^ indicates that the patient 

is dead at then X(t) = Y = Y^^^ because of Lemma llU. 31 so certainly X{t) ~ Y^^^ given 
Z^^. If Zt-^ indicates that the patient is alive at r^, then 

and the rest of the proof can be copied from the proof of Theorem I9.1UI □ 



Notice that X{t) is equal to the observed survival time y if t > y. According to Lemma llO.31 

y(*) = y for t > y, so for t > y we have that X{t) = y(*). 



10.7 Discretization and choices of conditional distributions 

The construction of the Z^J^^ can be copied from Section 19.71 Notice that, by construction, 

— (n) 

Z^^ includes whether or not a patient is alive at r^. 

Notation 10.14 At this point we choose conditional distributions P— ,— (n). Also we choose 

Fy,,)Mn) {y) = [ i^y(.)|z.,=. (y) dP-^^ |^(") (^) (32) 

to be the version of the conditional distribution function of Y^*'^ given Z^ which is used in 
the rest of the proof, and similarly for Y instead ofY^^\ If s G (t^jT^+i) we take the same 
version for F^^^^ _(„)/ this is possible because in that case 

These distributions are consistent with Lemma 110.31 in the sense that for y > t and for all 
w G O, -^y(t)|x (y) = Pyi'z (y)- This follows immediately from the fact that all -FV(*)lz 
are consistent with Lemma 110.31 in this sense. 

10.8 Existence of and two expressions for L)W 

We prove existence of D^^^ as defined in equation (|2()j) for the discretized situation of Sec- 
tion ^]7| Moreover, as in Section [9.81 we derive two useful formula's for D^"). 

The same way as in Section [9.81 we find that for t > t/. and y £ {t,y2], if Z^"^ indicates 
the patient is alive at Tk, 



d_ 
'dh 



h=0 



. I m\h=o ^Yit+h) =^ {y) dP^^^ |^(„) {z) 



I ^F^,),^^^^^ {y) dP- ^ (z) 



The limit for y [t exists because of Assumption EESt and Assumptions ll().T)l and [T().4b (the 
proof is the same as the proof for continuity of this expression in (y, t) in Section . Hence 
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with the versions of F _(„) chosen in Notation IIU. 141 

^ I s 



— (n) 

if indicates the patient is dead at t or y < t 



F 



F _(„) I (y) otherwise for y > t (33) 



dh\h=0 V y(t+h)|z^") y(t)|zj" 

hniyit D {y,t; Zt) otherwise fov y = t 

exists for every t. Moreover, for t £ [Tk,Tk+i) and y > t we find that 



(34) 



This expression is similar to expression (O for for non-survival outcomes. 

Expression H13() for non-survival outcomes takes a different form for survival outcomes. 
Just as for expression (|TT?|) we restrict to the il' defined in equation (|TH) in Section 19. 8( a set 

— (n) 

of probability one on which conditional probabilities given Z^ are uniquely defined. For n 

and a; S f]' such that Z^ indicates the patient is alive at the last Tk at or before time t, we 
will show that for y > t, 



D^''Uy,t;Zl 



in) 



E 



l{alivcatt} dh\h=Q^Y^t+h)\z^ (v) | -^t 



in)- 



E 



(35) 



The indicator of being alive at time t is new as compared to the non-survival case of SectionEJ 



We prove equation as follows. We restrict to a; G Suppose that Z^J!^ indicates the 



patient is alive at Tk and suppose that t > t^- Then for h > and y >t + h, 



+ 



E 



: alive at t 



Y(t+h)\Zt=zt 



(y) dR 



(n) 



Given that Y <t, Lemma [1101 gives that y(*+^) = Y <t, and, since y > t, also y(*+'^) < f < 
y. We conclude that 



p (y(*+^) < y\zt'^) = p (y E (r.,t] + / 1 

J Zt 



{alive at t} FY('^+'^^Zt=zt 



(y) dR, 



zt\z): 



in) 



(36) 



To derive equation from equation , we apply Corollary 19.81 on 

Eh (y) = P (y G (Tfe, IZJ."^) + ^ l{,Hve at t} Py (y) dP-^^^-^^n) {zt) , 

for y > t, with yo = y. We check the conditions of Corollarv 19.81 Just as in Section [9.81 we 
find that on y > t + h, h > 0, Eh (y) is differentiable with respect to y and h with derivatives 

/l{aliveatt}^y{t+h)|^^^^^ iv) dP^A-^^^) (^t) ^nd /l{aliveatt} |^Py{t+h) (y) ,^(n) (zj), re- 

spectively. Also the same way as in Section [9. 81 we find that these derivatives are continuous 
in {y,h). To prove that Eq (y) is non-zero we can use Assumption 110. 4h . if the probability 
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that the patient is ahve at t given Z^^ is non-zero. Indeed the probabihty that the patient is 

— (n) — 

ahve at t given Z^^ is non-zero, since given any Z-n. indicating that the patient is not dead 
at Tfc, Y, which has the same distribution as Y^'^') given Z^^ because of Assumption 17.11 has 
support [Tfc,2/2] f Assumption IIU. Ik ). Thus the conditions of Coroharv 19.81 are satisfied, and 
equation follows from equation 

10.9 Applying the discrete-time result 

We prove the following lemma: 

Lemma 10.15 Suppose that Regularity Conditions \lU.4\^T[n\ and Assumptions \7.1[ \1U.1\ 
and llU.^ (consistency and no instantaneous treatment effect at time of death) are satisfied. 
Then for every n there exists a continuous solution to the differential equation in the 

discretised setting, 

with final condition X*-") (r) = Y . X^"^^ (t) is almost surely unique. Moreover, > t if 

Y > t. Furthermore, X*^") (t) has the same conditional distribution as y^*) given zl^\ 

The proof of this lemma is different from the proof in Section 19.91 because of the different 
assumptions for the discrete-time case if the outcome is survival. 

Proof. Fix n. As before we drop the superscript for Tm^ for notational simplicity. We 
first show that there exists a continuous solution X^"^^ with X^"'\t) > t if Y > t for which 
has the same conditional distribution as Y^^^ given Zf- , using Theorem IIU. 131 We 
thus need to check that the conditional distributions F (n) of Y^^^ given zi"^ chosen in 

Notation UnHH satisfy Assumption IIU. 121 
The versions chosen in Notation IIU. 141 are 

^y(*)iz(;' (y) = / Fyw\z.,=z (y) "^^z^^izi-j • 

In Section flU. 71 we already saw that these distributions are consistent with Lemma llU.31 We 
check Assumption 110. a-d. If Z^^ indicates that the patient is alive at time then: 

a) F — („) has support [Tfc,y2] since all -FV(t)|;^ =^ have support [Tk,y2] (Assumption 
a). 

b) -Fy(,)|^^o(y)-^y(rfc+(t-rfe))|^w(?/) is continuous in on G M x [rfc,Tfc+i] : y > t 
because of Assumption llU.'^Sl a and Lebesgue's dominated convergence theorem, since all 

bounded by 1. For y < t, FY(t)\z {v) = Fy\z (u) because of Lemma ri0.3| 
which is continuous in {y,t) because of Assumption 11U.8I 

c) ^y(t)|^w(y) = ^y{r^+{t-r^))\zil^ (^z) ^ °^ (^Z,*) ^ [^^fc , 2/2] X [rfc,rfc+i] : y > t 
because all i^y(t)|-g^ (y) are there fAssumption llfl.5l a). and all derivatives are bounded 
there f Assumption IIU.'^ . which follows with the same reasoning as for F , („) as in 

Section 19.81 integration and differentiation can be interchanged here. 
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d) Under c) we saw that for y > t, 

Because of Assumption I1U.4I b this is greater than 0. 

□ 



10.9.1 Continuity of between the grid points 

We prove continuity of L>(") on {{y,t) € (Tfc,y2] x {'Tki'Tk+i) ■ y > t} with a continuous exten- 
sion to {{y,t) G [Tk,y2] X [Tk,Tk+i] '■ y ^ t} using equation (j^U). It has an obvious extension 
to {{y,t) G [Tk,y2] X [Tk,Tk+i] -.yyt}, 

!^yPy^^nz.^=-MdP^^^\zi-Az) ' 

which as we saw in Section 110.81 is continuous and has a continuous extension to (y, t) G 
[Tk,y2] X [rfc,rfc+i] :y>t. We caU this extension i)^"). 

10.10 Bounding the difference between X and X^") in terms of D and D*^") 

and X satisfy the differential equations with these continuous extensions and 
D, respectively, on the closed intervals [ti,t2] as in Section 19.101 because of the remark 
above Theorem loTol and since if F > t, both X{t) > t f Section Mli^ and > t 

(Section 110. 9p . D is Lipschitz continuous in y on these intervals, as we saw at the end of 
Section 110.51 Therefore we get in a similar way as in Section 19.101 but with Corollary IB. 51 
instead of Corollary IB. 41 and with final condition at min {r, 1"} instead of at r that almost 
surely 

/min{r,y} 
e^-(^-*)|L'(X(") (s),s) -L>('^)(X(") (s),s)|ds (37) 

and 

/■min{r,y} 

sup |x(")(t) -X(t)| = / e^-^|L>(x(")(s),s) -D(")(X(")(s),s)|ds, (38) 

te[o,T] Jo 

with C = La/e + CaLi/e^. 

10.11 Convergence of D*^"^ to D. 

In this section we prove that for all {y,t) fixed, D^") (y, t; z|"'') converges almost surely to 
D [y,t; Zt). For Zt indicating the patient is dead at time t, D [y,t; Zt) = and for n large 

— (n) 

enough Z^ indicates the patient is dead at the last at or before time t and thus also 

= 0, so converges to D. For y < t, D{y,t;^t) _and all D^'^\y,t;'Z^^^) 
are 0. Therefore it suffices to consider y > t and uj and t for which Zt indicates the patient 
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is alive at time t. We start by proving that for y > t, D^") (y, t; ^j"^) converges almost surely 
to D (y, t; Zfj . From H27() we know that if Zf indicates the patient is alive at time t, for y > t 



D{y,t-Zt) 



dh\h=O^Y(t+h)\Zt 



(y) 



and from (|35|) . since the patient is alive at the last at or before time t, for y > t 



E 


lalivo (i) 




E 


lalivo (i) 


§ij^YW\zt (y) \ ^ 





We will apply Levy's Upward Theorem (see e.g. IWilliamij (jl99lh page 134), which 



IS 



allowed since Fy(t+h)|2^j (y) and ■^FyWl'Zt bounded because of Assumption 1^31 

Levy's Upward Theorem leads to 



E 



d_ 

lalivo {t) 



h=0 



Fy(t+h)\z^ (y) 



E 



d_ 

lalivo (tj 



h=0 



^Y(t+'^)\zt ^y) 



„ / I lOO ryi'lT-) 



a.s. 



and 



E 



d_ 

dy' 



lalive (i) 



(y) 



<n) 



d 

lalivo (i) ^-^y(*)|Zt 



(39) 



a.s. (40) 



Replacing the conditioning on a(u^^iZ^ ) by conditioning on Zt in and (flU]) is allowed 
because of Lemma lA. Ill Since 

d 



E 



lalive (t) 



d_ 
dh 



E 



h=0 



Y(t+h)\Zt 



(y) 



and 



E 



d_ 

dy^ 



lalive (i) —FY^t)\Zt 



Zt 



lalivo {t) E 
lalivo {t) E 



h=0 



dh 



y('+h)|z 



,(y) 



(y) 



z. 



this implies that for y > t fixed 

Z?(") a.s. 

To prove that also 



D (t, t; Zt) a.s. 



(41) 
(42) 



we wish to use Lemma ID. II To do that we need Lipschitz continuity of all D^") and D \n y 
with the same Lipschitz constant, and if that is the case (|42|) follows. That D is Lipschitz 
continuous in y with Lipschitz constant jL2 + ^Li we saw in 

SectionHniSl For L>(") (y, t; zj"^) 

— (n) 

we can concentrate on Z^ indicating the patient is alive at the last at of before time t, 
since otherwise D^"^ is identically in y. Using Lemma ICLll and (|35|) . 



E 


lalivo (^) ^/j 


/i=o^y(*+'^)|Zt (y) 1^1 


E 


lalivo (0 


^^y{*)|Zt (y) l^i"^ 
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we will prove that for every n, D^"'^ is Lipschitz continuous in y with Lipschitz constant 
^-^2 + on the fi' C belonging to (which is a set of probability one). The 

numerator of here is Lipschitz continuous in y with Lipschitz constant P(Y > t\Z^f^^ ■ L2 on 
(t,y2], since for y,z e (t, ^2] 



E 



alive 



d_ 

dh 

= E 
< E 



h=0 



y(t+h)|Zt 



(y) z\ 



-An) 



E 



L alive 



1^ alive 



d_ 

dh 



h=0 



(y) 



it) 

d_ 

dh 



d_ 

dh 



/i=0 



Z 



(n) 



h=0 



W(t+h)\Zt 



(n) 



lalive (t) L2\y- z\\Z 



in) 



P Y >t Z 



-Fyin) 



L2 \y - A 



because of Assumption 110.71 b. The same way but with Assumption 110. 7l a we see that the 
denominator of H35|) is Lipschitz continuous in y with Lipschitz constant P{Y > • Li 

on (t, y2]- The numerator is bounded in absolute value by P{Y > tlzl""*) • C2 because 
of Assumption 110.51 b and the denominator is bounded from below by P(y > • e 

because of Assumption 110.41 c. P{Y > t\z[^^) is greater than 0, since given z[^^ which 
indicates the patient is alive at rfc(j), the last Tk at or before time t, Y^'^^ has support [t^^^), 1/2] 
f Assumption 110.41 a) and thus so has Y f Assumption 17. 1 j) . 

Hence Lemma IC.ll leads to D^'^^ being, for uj € , Lipschitz continuous in y on (t, 1/2] 
with Lipschitz constant + ^Li- The same constant works on [t,y2\ by continuity. Thus, 
since 0' has probability one, H42() follows. 

Remark that because i)^"^ (y, t; Z^ ) is Lipschitz continuous in y with Lipschitz constant 
\L2 + ^Li for every t and uj £ fi', we get from the remark after Corollary IB. 51 that for 



€ r2', so almost surely, X^"^^ is the unique solution to 



with final condition X*^"^ (r) = Y . 

10.12 converges to X(i) and X(t) is measurable 

Equations (|3H1), (EH and (gH) are the starting point here. If F > t, both X{t) > t {Sec- 
tion lTn3|) and > t (Section [TITl?]) . so that the rest of the proof can be copied from 

Section im 



10.13 Conclusion 

This section can be copied from Section [9. 131 



10.14 Mimicking counterfactual outcomes: discrete-continuous time 

This section can be copied from Section 19.141 



Mimicking counterfactual outcomes 



47 



11 Discussion 



It would be interesting to investigate whether the Support Condition 19.11 or I1U.4I can be 
weakened, for example to an assumption about the support varying in a differentiable way 
between the jump times of the covariate- and treatment process Z. We expect that in that 
case one has to assume that where Z jumps the support of Y^^'^ given Z^ gets smaller or stays 
the same as t increases (see Figure inj- Otherwise X{t) may move out of the support of Y^^^ 
given Z^ (recall that X is the solution to a differential equation with final condition). It is 
reasonable to assume that the support of 1"^*^ given Zt gets smaller or stays the same as t 
increases, since more information about Z should not enlarge the range of y(*\ 

support 

of yw 

given Zt 



jump time of Z t 
Figure 6: Example of support of Y^^^ given Zt- 



A problem which may occur without a support condition is that the denominator in equa- 
tion © (the quotient expression for D) or in equation (|12() or (the quotient expression 
for may tend to 0, which may "blow up" D or D^"^ In that case it might help to 

assume: 



Assumption 11.1 There exists a constant C such that 

a) for all uj £ t and y, 

b) for all t, y and B C 'Zt with P (Zt e B) > 0, 

^Yi^+^)\z,eB ° ^Y(^nzteB {y)-y<c-h. 



That is, quantiles of the outcome of interest do not move more than C ■ h when treatment is 
stopped at time t + h instead of t, both given that the treatment- and covariate process is Zt 
and given that it satisfies Zt € B for some B with P (^Zt G -B) > 0. This assumption does 
not look unreasonable if there is no "instantaneous treatment effect". It is to be expected 
that under this assumption both D and D^"^ are bounded by C. 
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APPENDIX 



A Some facts about conditioning 

The following definition and two theo rems on existence and uniqueness of conditional distri- 
butions can be found in iBaueil (|l972l 'l Section 10.3, in a different formulation. The first is a 



definition of conditional distributions. A conditional distribution of a random variable X with 
values in M is more than just a set of conditional probabilities P {X < x\Q) for x G M: it is also 
a probability measure on M. Conditional probabilities always exist; a conditional distribution 
always exists e.g. if X takes values in M, but not in general. Conditional probabilities are 
almost surely unique; under conditions the same is true for conditional distributions. 

Definition A.l Let X : {^l,J-) {X,A) be a random variable on a probability space 
P) with values in a measurable space {X,A). Let Q <Z J- be a sub-a -algebra. Then 
Px\g : X ^ ^ M is a conditional distribution of X given Q if 

a) \/A ^ A: ijo ^ Px\g ^) ^ version of P {X € ^j^), i-c. it is Q-measurable and MG € Q: 

[ Pxig{co,A)dP{co)= [ lA{X)dP = P{GnX-^{A)). 
JG JG 

b) Px\g ("^j ■) 0, probability measure on {X,A). 

If X takes values in M the distribution function belonging to the probability measure Px\g 
will often be denoted by Fx\g- 

Theorem A. 2 Let X : [Vl^J^) {X,A) be a random variable on a probability space (fi,^, P) 
with values in a measurable space {X,A). Suppose that A is a countably generated a-algebra 
and Q d J- is a sub-cr- algebra. If Px\g o-nd Px\g ^''"^ ^^"^ conditional distributions of X given 
Q then they are almost surely the same in the sense that there exists a P-null set N ^ T such 
that for all ijO ^ Vt \ N and all A & A, 

Pxig{u;,A)=P;,^g{u;,A). 

A topological space E is called Polish if it has a countable dense subset and there exists 
a metric that generates the topology and for which the space is complete. An example of a 
Polish space is M.^ with the usual topology. 

Theorem A. 3 Let X : {Q,J^) — > {E,B{E)) be a random variable on a probability space 
{n,J-,P) with values in a Polish space E with its Borel-a -algebra. Then for every a-algebra 
Q d J- there exists a conditional distribution Px\g- 

The next theorem is very useful in combination with Theorem IA.3I Suppose that Z is a 
random variable on {0,,J^) with values in the space D[a,b] equipped with the u-algebra 
generated by the coordinate projections. Then Theorems I A. 31 and I A. 41 imply that for any 
cT-algebra Q C J^, Z has a conditional distribution given Q. 
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Theorem A. 4 Suppose that a, 6 € M are finite. Then D [a, b] with the Skorohod topology is 
a Polish space. Furthermore, the a-algebra on D [a, b] generated by the Skorohod topology is 
the same as the a-algebra on D [a, b] generated by the coordinate projections. 



The first statement of this theorem can be found in iBihingslevI (|l968l ^. Chapter 3, the second 
statement is Theorem 14.5 in the same book. 

The next lemma is an easy consequence of the existence of conditional distributions: 

Lemma A. 5 Let X and Y be random variables on a probability space (17, P) with values in 
{M,B), with B the Borel-a- algebra on M. Suppose that Q <Z T is a sub-a- algebra. Then there 
exist conditional distributions Px\g CLi^d -Py|g- -(f moreover for every x & Q, P {X < x\Q) = 
P (Y < x\Q) a.s., then Px\g = -fV|g ^■■5- in the sense that there exists a P-null set N ^ T 
such that for all uj £ 0,\N and all B €z B, 

Px\g{oJ,B)=PY\g{u^,B). 

Proof. Existence of conditional distributions follows from Theorem IA.3I since (M, B) is a 
Polish space. Furthermore a probability measure on (M, B) is completely determined by its 
values on (— oo,x] for x € Q. So it is enough to prove that there exists a P-null set N £ T 
such that 

VxeQ Px\g{i^,{-oo,x\) = PY\g{oJ,{-oo,x\) . (43) 

But for every x G Q, 

Px\g{oo,{-oo,x\) = P{X<x\g)a.s. 

= P{Y < x\g) a.s. 

= Py|g (cj, ( — CXD, x]) a.s. 

Put 

N = iJxm ■ Px\g (-00,X]) ^ Py|g (w, (-00,x])} . 

This is a countable union of null sets, so a null set, and it satisfies (|43|) . □ 

For the proof of Lemma lA. 81 and Lemma lA. 91 we use the following two lemma's. The first 
is well-known. 

Lemma A. 6 If Px\g is a conditional distribution of X given Q then 

E [f (X) \S]= J f (x) dPx\g (x) a.s. 

Lemma A. 7 Suppose that Z and Y are random variables on a probability space {i},J-,P) 
with values in Polish spaces {y,Ai) and {Z,A2), respectively. Then 



{u},A)t — > / 6^,^^^^)PY\z=z{Lo){dy)dz' 
J A 

: X a {Ai X A2) ^ M. is a version of P(y^z)\Zj i-^- ^ is a conditional distribution function 
of (Y, Z) given Z . 
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Proof. Put P{uj,A) = Jj^S^i ^z{uj)Py\z=z{uj) {dy) dz' . Condition a and b of Definition \K7\\ 
have to be checked for P. Condition b: for oo fixed it is indeed a probabihty measure on 
((3^ X Z) ,a {Ai X .42)) (concentrated on z = z iuj)). 

Condition a: we first show that for any A of the form Ai x A2 with Ai € Ai and A2 € A2, 
oj ^ P {u),Ai X ^2) is a version of P ((y, Z) G {Ai x A2) |Z). Equivalently, for A of the form 
Ai X A2 and G G a (Z), so G of the form (S) with B G >l2, 

/" P(c^,^)dPH= [ lA{{Y,Z))dP = P{Gn{Y,Z)-^{A)). 
Jg Jg 

This can be shown as fohows: 



P (io, Ai X A2) dP{Lo) = [ ( [ 5,,^,Py\z=z {dy') dz') dPz (z) 

Jb \JAixA2 / 

= I ([ Py\z=z {dy')) dPz (z) 

JBr\A2 \JAi / 



= [ P {¥ £ Ai\Z = z) dPz (z) 
JBr\A2 

= P {Y-^ {Ai) r\ z-'^ {B r\ A2)) 

= P{Z-\B)r\{Y,Z)-\AiX A2)). 

Next we show that this is sufficient. Notice first that since {{y x Z) ,a{Ai x ^2)) is a 
Pohsh space, there exists a conditional distribution P(y,z)\z- We show that P a nd P(y,z)\z are 
almost surely equal, using the Uniqueness Theorem on page 27 of lBaueJ (jl972l 'l. Remark that 
both Ai and A2 are countably generated, say by A^ and A^, so that a {Ai x A2) is countably 
generated by A^xA^^'- every Ai x A2 is an element of a [A^ x A2) , since ^1 x A2 = ^1 x Zny x 
A2. To apply the Uniqueness Theorem we need a generator which is intersection-stable (i.e., 
finite intersections of elements in A^ x A2 are still ir i A^xA!^). A^ x A2 need not be intersection- 
stable, but as in the proof of Theorem 10.3.4 in iBauen (|l972h : when finite intersections of 
elements in Ai x A2 are added to the generator it stays countable. Notice that these finite 
intersections are still of the form Ai x A2 since {Ai x A2)r\{Bi x B2) = {Ai D Bi) x {A2 x B2), 
and notice moreover that this leads to a countable intersection-stable generator ^} x 
Because of the former paragraph, for all Ai x A2 with Ai G Ai and A2 G A2, P (1^, A\ x A2) 
is a version of P {{Y, Z) e Al x A\\Z), so P {uj,A\ x A^) = P(y,z)\z {^,^1 x Al) a.s. Hence 
because of the countability 

^AlxA^.AleAlAheAl ■ P ('^'^l X ^2) P{Y,Z)\Z X A\)] 

is a null set. Thus the Uniqueness Theorem on page 27 of lBauer] implies that indeed 



P and P(y^^)|^ are equal except for on this null set. □ 

Lemma A.S Suppose that Y has a continuous conditional distribution function Fy\z given 
Z. Then Fy\z (Y) is uniformly distributed on [0, 1] and independent of Z . 

Proof. Because of Lemma IA.5I it suffices to prove that for all x G [0, 1] , 
P {Py\z 0^) — x\Z) = X a.s. This can be done as follows. Define 

FyL {x+) = sup {y : FY\z{y) < x} . 



Mimicking counterfactual outcomes 



51 



Then 



P{Y < Fyl^ (x+) \Z 



P{Fy\z{Y)<x\Z 

since Fy|^(y) < x implies that Y < sup {y : Fyiziv) < 2;} = Fy^^ {x+) and since Y < 
^Y\z (^+) ~ ^^P {y '■ PY\z{y) ^ imphes that Fy\z{Y) < x hy continuity of Fy\z- Hence 

P {Fy^z (Y) < x\Z = z) 



Z = z 



p 



{y,z'):y<F-^„_,{x+) 



{Y,Z)\Z=z 



{dy, dz') 



{y,z'):y<F-^„_,{x+) 



^z,z'Fy\Z=z 



{dy) dz' 



F, 



y-y<FY\z=.^^+) 
FY\z{Fyl^{x+)) 



Y\Z=z 



(dy) 



a.s., 



where we use Lemma lA.61 in the second hne, Lemma I A. 71 in the third hne, and continuity of 
Fy\z in the last hne. □ 

Lemma A. 9 Suppose that X is uniformly distributed on [0, 1] and independent of Z and 
that Fy\z is a conditional distribution function ofY given Z. Then Fy^^ (X) has conditional 
distribution function Fy\z given Z . 

Proof. Because of Lemma IA.5I it suffices to prove that for all s, 

P{Fy^^ {X) < s\Z) = Fy\z (s) a.s. This can be done as follows: 

P{Fy'iX)<s\Z = z) = P{FyUX)<Fy'oFYiz{s)\Z = z) 



P{X<FY\zis)\Z 



E 



^{X<Fy\z{s)} 



z 

Z = z 



P 



{X,Z)\Z=z 



idx' , dz') 



{x',z'):x'<Fy.^^^,(s) 



Sz,z'Fx\z=z {dx) dz' 



F 



x':x'<Fy\z=z{s) 

Fy\z {s) a.s. 



X\Z=z 



{dx') 



In the second line we use that if X < FY\zis) then also Fy|^(X) < Fy^^ o Fy\z{s), and 
moreover that il X > Fy\z{s) then also, since Fy\z{s) is in the range of i*V|z ^^id conditional 
distribution functions are right-continuous, > Fy^^ o Fy\z{s)- In the fourth line we 

use Lemma IA.6I in the fifth line we use Lemma I A. 71 and in the last line we use that X is 
uniformly distributed on [0, 1] given Z. □ 
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Lemma A. 10 If X is a random variable taking values in M and for every bounded Lipschitz 
continuous function / : M — > M 

E[fiX)\Z]=E[f{Y)\Z] a.s. 
then X has the same conditional distribution as Y given Z . 

Proof. Because of Lemma IA.5I it suffices to show that for every a; G M, 
P{X < x\Z) = P{Y < x\Z) a.s. 

Analogously to a proof of the Portmanteau Lemma put 

fm (y) = md {y, (-00, x]) A 1 

for m = 1,2, . . .. Then < fm ] l(x,oo) as m — > 00 and fm is bounded and Lipschitz, so that 
E [fm (X) [Z] = E [fm (Y) [Z] a.s. The remaining part is straightforward: 

P{X<x\Z) = (X) [Z] 

= [1 - (X) [Z] 

= 1-E[1^^^^){X)\Z] a.s. 

= 1- lim E[fmiX)[Z] a.s. 

= 1- hm E[fm{Y)[Z] a.s. 

m—foo 

= P(Y < x[Z) a.s., 

wher e in the fourth line the conditional Monotone Convergence Theorem (see e.g. lFabius and van Zwetl . 



1975h is used. □ 



We end Appendix ^ with a statement about conditioning on increasing families of a- 
algebras. 

Lemma A. 11 Let X be a random variable with E \X[ < 00, and let Zf be a random variable 
with values in Zt, the space of cadlag functions on [0, t] provided with the projection a-algebra, 

with P(Z jumps at t) = 0. Then any version 0/ £'[A[(T(u5^;^zj"'')] , with zj"^ as defined in 
ChavtersT^ and HH is also a version of E ^X[Zt\. 

Proof. Remark that (T(zj"'') is increasing in n, and that for t on the infinite grid a([J'^^^zl^^^a (^Zt) 

and for t not on the infinite grid (t(u5^;^z|"^) = a (^Zt-), where Zt- = {Z{s) : s < t). But 
the probability that Z jumps at t is equal to zero. Therefore E ^X[Zt\ = E ^X\Zt~~\ a.s.: 
any version of E [A|Z(_] is a version of E ^X[Zt\ ■ This can be seen as follows. E \^X[Zt-'\ is 
trivially a (Z'i)-measurable. So it still has to be checked that for any measurable f : Zt ^ ^ 
for which E(\Xf (Zt)\) <oo, E {Xf {Zt)) = E {E [X\Zt^] f (Zt)). So let such / be given. 
Define g : Zt- — > Zt as the "continuous" extension: 



gizt-) (s) 

Then 



z{s) if s < i 

lim^ft z{u) if s = t. 



E{Xf{Zt)) = E{Xf{g(Zt-))) 

= E {E [X\Zt-] f {gJZt-))) 
= E {E [X[Zt-] f (Zt)) , 
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where in the first and the last line we use that the probability that Z jumps at t is equal to 
zero and in the second line that E ^X\Zt-~\ is a version of E \^X\Zt\ ■ Therefore the conditional 
expectation of X given Zt is almost surely equal to the conditional expectation of X given 

B Some theory about differential equations 

Theorem B.l Suppose that a function D [y,t; Z^) satisfies 

a) (continuity between the jump times of Z). If Z does not jump in (ti, ^2) then D (y, t; Zt) 
is continuous in {y,t) on [ti,t2) and can be continuously extended to [ti,t2]- 

b) (Lipschitz continuity). For each w G $7 there exists a constant L (lo) such that 

\D {y,t;Zt) - D {z,t;Zt)\ <L{u;)\y-z\ 

for all t G [0, r] and all y, z. 

Suppose furthermore that for each uj Q there are no more than finitely many jump times of 
Z. Then, for each to £ [0;''"] ^'f^'^ Uo S ^! there is a unique continuous solution x{t;tQ,yQ) to 

x'{t) = D{x{t),t;Zt) 

with boundary condition x (to) = yo and this solution is defined on the whole interval [0,r]. 

This t heorem follows from well-known results about differential equations, see e.g. Duistermaat and Eckhau^ 



199a) Chapter 2. 

For the next theorem we also refer to Duistermaat and Eckhaii3 (|l995h Chapter 2. It is a 
consequence of Gronwall's lemma. 

Theorem B.2 Suppose that I is an open or closed interval in M, f : I x M" M" is 
continuous and C : / — > [0, 00) is continuous, and suppose that 

\\ f{x,y)- f{x,z) \\<C{x) \\y-z\\ (44) 

for all X ^ I and y,z G M". Then, for every xq £ I and yo £ M, there is a unique solution 
y (x) of y' {x) = f {x,y{x)) with y (xq) = yo, and this solution is defined for all x £ I. If 
g : I X M" — > M" is continuous and z : I ^ is a solution of z' (x) = g {x, z{x)) then 



y{x) - z{x) II 

< e^^'o ^«)'^« II y (xo) - z (xo) II + / ^' e-^^ '^^'^^''^ \\ f (C, z (0) - g (C, z (0) || d^ 



X 



xo 



for all x,xo £ I with xq < x. 

In Duistermaat and Eckhaui ( 1995h the interval is always an open interval, but as is gen- 



erally known this can be overcome by extending both / and g outside the closed interval 
/ by taking the values at the boundary of /. This preserves the Lipschitz- and continuity 
conditions. Existence and uniqueness on all of finitely many intervals implies global existence 
and uniqueness; this is the way we will often apply this theorem. 

We have a differential equation with end condition at r, so we are interested in x,xo with 
X < Xq. The following corollary can be used. 
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Corollary B.3 Suppose that the conditions of Theorem \B. g| are satisfied. Then, for every 
Xo € I and yo € M", there is a unique solution y (x) of y' (x) = / (x, y{x)) with y (xq) = yo, 
and this solution is defined for all x ^ I . If g : I x — > is continuous and z : I ^ is 
a solution of z' (x) = g (x, z{x)) then 



xo 



II y (x) - z{x) II 

< e/.^° ^(^^'^^ II y (xo) - z (xo) || + / " e^^^ ^("^'^'^ || / (s, z (s)) -g{s,z (s)) || 

for all X, xq with x < xq. 

Proof. Put y{t)=y (xq - t). Then 

y (t) = (xo - 1) 
= -y'{xQ-t) 

= -f {xQ-t,y{xQ-t)) 
= -f{xo-t,y{t)) 
= Ht,y{t)) 

where / {t, y) = —f (xq — t,y). So y (t) = y (xq — t) is a solution of the differential equation 
y' (t) = f {t,y (t)) with boundary condition y (0) = y (xq) = yo- Applying Theorem IB .21 on y 
concludes the proof, as follows. 

II y(x) - 2;(x) II = II y(x - Xq + Xo) - 2;(x - Xq + Xq) II 

= II y{xo - {xo - x)) - z{xo - (xo - x)) II 
= II y{xo - x) - z{xo - x) II 

= \\m-m\\ 

with t = Xo — X > 0. Notice that since because of equation 

\\f{t,y)-f{t,z) \\<C{xo-t) \\ y - z \\=: C{t) \\ y - z \\, 
with C{t) = C{xq — t). Hence Theorem IB . 21 implies that 

II y{x)-z{x) II < e^oCim II y(0) -z(0) II + feloCMdv || / (e, z (0) - 5 (C, ^(6) II 

Jo 

= e/o c(xo-m II _ 0) - z{xo - 0) II 



For the first term we do a change of variables; ^ from to t, put s = xq — = —ds. 

< ^ < t; s from xo — to xo — i = xq — (xq — x) = x. We conclude that the first term is 
equal to 

e-i:,cis)ds II y^^^^ _ ^^^^^ 11^ ^i:^cis)ds II _ II _ 

For the second term similar changes of variables can be done, resulting in Corollary IB. 31 □ 
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Corollary B.4 Suppose that I is a closed interval inM, f : I x [yi,y2] ^ M is continuous 
with for all x £ I, f {x,yi) = f {x,y2) = and C . I ^ [0, oo) is continuous, and suppose 
that 

\f{x,y)-f{x,z)\<C{x)\y-z\ (45) 

for all X £ I and y,z £ [2/1,^2]- Then, for every xq £ I and yo G [2/1,2/2]; there is a unique 
solution y (x) of y' (x) = / (x, y(x)) with y (xq) = yo, and this solution is defined for all x € /. 
Furthermore y (x) € [yi, 2/2] for all x £ I. Suppose that g : I x [yi,y2] ^ M is continuous and 
z : I ^ [2/1,2/2] is a solution of z' (x) = g (x, z{x)) then 

\y{x) - z{x)\ 



XQ 



for all x,xq € I with xq < x and 
\y{x) - z{x)\ 



< e^:' ^(^)'^^ \y (xo) - z (xo) I + / e/^^ \f [s,z [s)) -g{s,z {s))\ ds 



X 



for all X, Xo € / with x < xq. 

Proof. Write / = [xi, X2]. In order to apply Theorem IB . 21 define an extension / : M x M ^ M 
of / as follows: 

f{x,y) if (x,2/) € / X [2/1,2/2] 
f{xi,y) if (x,2/) e (-oo,xi] X [2/1,2/2] 
f{x,y) = \ /(x2,2/) if (2^,2/) e (x2, 00) X [2/1,2/2] 
/(x,2/i) if 2/ < 2/1 
/(x,2/2) if 2/ > 2/2 

If there exists a unique solution of the differential equation with / and this solution stays in 
[2/1)2/2], then this solution is also the unique continuous solution of the differential equation 
with /. 

On the differential equation with / Theorem IB . 21 will be applied. / is continuous on M x M 
because / is continuous on / x [2/1,2/2] and / (x, 2/1) = = / (x, 2/2) for every x G /. Also there 
exists a continuous C satisfying equation (|44|) : define C as an extension of C as follows: 

r C(x) ifxG/ 
C (x) = < C (xi) if X € (—00, xi) 
[ C (X2) if X G (x2, 00) . 

That this C satisfies equation can easily be checked by first considering x S [xi,X2] and 
reducing different x to xi and X2. 

Thus Theorem IB . 21 implies that there is a unique solution of the differential equation with 
/. That the solution stays in [2/1,2/2] is clear from the fact that / = for y € {2/1,2/2} and the 
fact that the solution is unique. 

Since g can be extended the same way as / and z stays in [2/1,2/2] by assumption, the 
bound for \y (x) — z {x)\ given by Theorem IB. 21 also holds here. The second bound follows 
with the same reasoning from Corollarv IB. 31 in the appendix. This finishes the proof. □ 
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Corollary B.5 Suppose that I = [xi,X2] C [0,7/2] is a closed interval in M, 
/ : {{x,y) £ I y< [0,1/2] ■ y > x} M is continuous with for all x G I, f {x,y2) = and 
f {x, x) < 1 and C : I ^ [0, 00) is continuous, and suppose that 

\f{x,y)-f {x, z)\<C (x) \y - z\ 

for all X ^ I and y,z ^ [x, 1/2] • Then for every yo S [2:2, 7/2] there is a unique solution y (x) of 
y' (x) = f {x, y{x)) with final condition y (2:2) = yo; <^'nd this solution is defined for all x € /. 
Furthermore y (x) € [x, 2/2] for all x G /. Suppose that g : {(x, y) € I x [0, y2] : y > x} ^ is 
continuous and z : I ^ [0, 2/2] is a solution of z' (x) = g (x, z{x)) with z (x) > x then 



\y{x) - z{x)\ 

< ei:' ^W'^^ \y (X2) - z (X2) I + / ^ ei: ^^^'^"^ \f {s, z{s))-g {s, z {s))\ ds 



X2 



for all X G I. 

Proof. This can be proved the same way as Coronary IB.4I if one defines 



f {x,y) 



{ f{x,y) if (x,y) G / X [0,2/2] : y > x 

f{x,y2) ifxG/andy>y2 

/ (x, x) if X G / and y < x 

f{xi,y) ifx<xi 

. fix2,y) if X > X2. 

Remark that the solution y (x) stays in (x, y) G / x [0, 7/2] : y > for x G / since / (x, x) < 1 
and / (x,y2) = 0. □ 

Remark that if it is not known whether / (x, x) < 1 but it is known that / is continuous in 
(x, y) and Lipschitz continuous in y on the set mentioned in Corollary IB.5| then the proof 
above shows that if a solution y (x) exists for which y (x) > x for x < X2 then this solution is 
unique. 



C Lipschitz continuity and differentiability 

The following lemma can be useful for proving Lipschitz continuity of quotients of functions. 

Lemma C.l Suppose that f and g are functions from M to M which are Lipschitz continuous 
with Lipschitz constants Lf resp. Lg. Suppose furthermore that g > e for some e > and 
\f\ < C' for some C > 0. Then f/g is Lipschitz continuous with Lipschitz constant e.g. 
Lf/e + CLg/e\ 



Proof. 



f{xi) f{x2) 



g{xi] 



9{X2) 



< 



f{xi) f{x2) 



g{xi) 
1 



9{xi) 



+ 



f{x2) f{x2) 



9{xi'j 



9{xi) 



|/(xi)-/(x2)l + 



9[X2) 



9{xi)g{x2) 



\g{xi) - g{x2)\ 



< 



-Lf \xi - X2I + —Lg \xi - X2I . 
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□ 



The next lemma deals with a continuous function / on a closed interval which is continu- 
ously differentiable on the interior of that interval. If /' can be continuously extended to the 
closed interval, then / is continuously differentiable on the closed interval. 

Lemma C.2 Suppose that f is continuous on [ti,t2] and f is continuously dijferentiable 
on (ti,t2)- Suppose furthermore that f has a continuous extension to [ti,t2]. Then f is 
differentiable from the right at ti with derivative lim(|(^ /' (t) and differentiable from the left 
at t2 with derivative lim^i^j /' 

Proof. We just prove the statements for ti; the proof for t2 is similar. Define g (t) = 
f{ti) + j;j'{x)dx. Then g is continuous and continuously differentiable on [ti,t2) with 
derivative f'{t) on (ti,t2) and limfi^j f {ti) at ti. It suffices to show that f = g on [ti,t2), 
since then / has the same properties as g on [ti,t2)- To do this, remark first that f — g is 
constant on (ti, ^2) since it is differentiable there with derivative 0. Because f—g is continuous 
on [^1,^2)) / — 9 is also constant on [ti, ^2)- (/ — 9) (*i) = 0. Thus f = g on [ti, t2)- ^ 



D Convergence Theorems 

A lemma with a corollary: 

Lemma D.l Suppose that the random functions f^ : [yi,y2] — IK o,nd : [yi,y2] — *■ 1^ 

(n = 1,2,...) are 'asymptotically uniformly equicontinuous with probability one', i.e. there 
exists n' with P {Q') = 1 such that for all uj £ n' : > 36 > 3N: Vn > A^.- 



\y — z\ < 5 



l/;r(2/)-r(^)l<^ 

\r {y)-r {z)\<e. 



Suppose furthermore that for all y £ (yi, 1/2) H Q, (y) f^ (y) a.s. Then 

sup ir(y)-r(y)l-o a.s. 

y&[yi,y2] 

Remark: the regularity condition for Lemma fP.ll is e.g. satisfied if there is a Lipschitz constant 
L such that all ff^ and are Lipschitz continuous in y with Lipschitz constant L (put 
S = e/L). 



Proof. Put n" = {uj: fii (y) (y) Vy G Q n (yi, 2/2)}- Then Q" has probability one 

minus countably many null sets). Put l^o = ^' n Then also 0,q has probability one. We 
show that for ah uj G ^o: snpy^^y^^y^^^ l/^^' (y) - ff^ {y)\ ^ 0. 

So let LJ £ 0,Q and e > be given. To show: there exists an N such that Vn > N: 
^'^'Pye[yi,y2] l/n ill) ~ iv)] < ^- Choose A^i and 6 > such that for all n > Ni: 



\y — z\ < S 



|/-(y)-/-(z)|<e/3 
ir (y)-r {z)\<e/3. 



This is possible because to £ fi'. Next choose y^^\ . . . ,y^^'^^ G Q n (yi,y2) such that for all 
y G [yi,y2] there is a y^*) with \y — y(*)| < 5. After this choose such that for all n > N^: 



max 

l<i<Ar2' 



f^{y^'^)-r{y^'^)\<e/3. 
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This is possible because uj G 0" and the number of y^'^'s is finite. Then for n > N 
max{iV3,iVi}: 

ir(y)-r(y)l < mm ( If:: iy)-f^{y(^'>)\ + \r{y(^'>)-riy)\ 



< e/3 + e/3 + e/3 = e 



+ ,maxj|/-(y»)-r(y»)|) 



□ 

Corollary D.2 Under the conditions of Lemma \D.ll if Xn is a series of random variables 
with values in [2/1,2/2]) 

ir(X„)-r(X„)|-0 a.s. 

We include a well-known corollary of Lebesgue's Dominated Convergence Theorem, to 
have the precise conditions at hand. 

Theorem D.3 Suppose that fi is a measure on a measurable space {X,A). Suppose that for 
all t G [a, b], ft is a measurable function on X . Suppose furthermore that for every t, ^ft (•) 
exists. Suppose that there exists an integrable g such that for all t: \§ift\ < 9 o-nd that f^ is 
integrable for some to S [a, b] . In that case ft is integrable for all t and 
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